Sureberus
Sureberus is a data validation and transformation tool that is useful for validating and normalizing "documents" (nested data structures of basic Python data-types). You provide a schema which describes the expected structure of an object (and optionally, various directives that modify that structure), along with a document to validate and transform, and it returns the new version.
Sureberus is a spiritual descendent of Cerberus, more-or-less uses the same schema format. There are some differences, though, which you can read about in Differences from Cerberus.
Directives
This chapter provides a reference of all Sureberus schema directives.
allow_unknown
Validation Directive
type: bool
When True, extra keys in a dictionary are passed through silently.
When False, keys that are found in a dictionary but which aren't specified in a fields schema will cause an error to be raised.
allowed
Validation Directive
type: list
of arbitrary Python objects
The object being validated must be equal to one of the objects in the list in order to pass validation.
*of (anyof, oneof)
Meta Directive
type list
of Sureberus schemas
Try applying schemas in sequence to the current value.
These directives should be avoided, and choose_schema
should be strongly preferred, if possible.
These directives are generally inefficient and result in hard-to-read error messages.
When anyof
is used, then as soon as any schema applies successfully, its result is returned.
When oneof
is used, ALL schemas are checked, and if more than one can be applied successfully, an exception is raised
(this is very unlikely to be useful, you should probably just use anyof
).
In either case, if none of the schemas can be applied without error, then a validation error will be raised.
Unlike Cerberus, these directives allow Transformation Directives to do their work as well. If a schema can be applied successfully, the transformations it applies will be returned.
choose_schema
Meta Directive
type dict
described below
Introduced in Sureberus 0.8.0
Choose a schema based on different factors of the input document and the current Context. See Dynamically selecting schemas for more information.
The directive value is a dictionary which must contain one of the following keys.
choose_schema/when_key_is
type dict
containing key
, choices
, and optionally default_choice
Dynamically selects a schema based on the value of a specific key, specified by the key
sub-directive.
For example, if you have a value like {"type": "foo", "foo_specific": "bar"}
,
where the foo
part determines which other keys might exist in the dict (like foo_specific
),
then this directive can help you choose a specific schema to validate with.
When this directive is applied, it determines a schema to apply by accessing the key named by the key
sub-directive in the value (which we'll call the "choice").
If it's not found, then default_choice
is used.
It then looks up the schema to use by looking for that "choice" in the choices
sub-directive.
choose_schema/when_key_exists
type dict
(described below)
Dynamically selects a schema based on whether a certain dict key exists.
The directive should be provided a dictionary, where each key can potentially match a key in the value dictionary. Each value in the directive dictionary should be a Sureberus schema to apply to the dictionary if the key exists in the dictionary.
choose_schema/when_tag_is
type dict
containing tag
, choices
, and optionally default_choice
This is very similar to when_key_is
, but instead of choosing a schema based on the value of a dictionary key, it does it by using the context.
It goes hand-in-hand with the set_tag
or modify_context
directives.
When this directive is applied, it determines the schema to apply by looking up a tag named by the tag
sub-directive (which we'll call the "choice").
It then looks up the schema to use by looking for that "choice" in the choices
sub-directive.
choose_schema/when_type_is
type dict
(described below)
Introduced in Sureberus 0.11
This directive is given a mapping of type names (using the same names that the type
directive takes) to schemas.
A schema is chosen based on the type of the value.
choose_schema/function
type Python callable (value, context)
-> Sureberus schema
Dynamically choose a schema to use based on the current value and the Context object. The schema returned by the Python function will be applied to the value.
coerce
Transformation Directive
type Python callable (value) -> new value
, OR a string naming a registered coerce function
Call a Python function with the value to get a new one to use.
Or, if the directive is a string, look up the registered coerce function to perform coercion.
By default, you can pass "to_list"
or "to_set"
to convert the value to a list or set, if the value is not already a list or set, respectively.
It's important to note that this function is called before all other directives that might reject a value. This is a good directive to use if you want to normalize invalid documents to a form that can be considered valid.
coerce_post
Transformation Directive
type Python callable (value) -> new value
, OR a string naming a registered coerce function
Call a Python function with the value to get a new one to use, after all other validation.
Or, if the directive is a string, look up the registered coerce function to perform coercion.
By default, you can pass "to_list"
or "to_set"
to convert the value to a list or set, if the value is not already a list or set, respectively.
Unlike coerce
, this function is applied after all other directives,
so it's allowed to return values that wouldn't validate according to other directives in your schema.
coerce_with_context
Transformation Directive
type Python callable (value, Context) -> new value
, OR a string naming a registered coerce function
Introduced in Sureberus 0.12.0
Call a Python function with the value and the Context to calculate a replacement. Or, if the directive is a string, look up the registered coerce function to perform coercion.
This can be used in tandem with set_tag
or modify_context
to pass data to transformers that are run on deeper parts of the document.
The function can access tags stored in the context with the Context.get_tag(tag_name)
method.
coerce_post_with_context
Transformation Directive
type Python callable (value, Context) -> new value
, OR a string naming a registered coerce function
Introduced in Sureberus 0.12.0
Identical to coerce
, but runs after all validation.
coerce_registry
Meta Directive
type dict
of str
(coerce names) to Python callables
Introduced in Sureberus 0.9.0
This allows you to register functions with a name that can be used in the coerce
and coerce_post
directives.
Each key in the directive should be a name, and the value should be a Python function that takes a single argument and returns a new value,
just like the functions you would normally pass to coerce
.
Then you can pass the name of the registered function to coerce
or coerce_post
to invoke the registered function.
debug
Meta Directive
type str
Introduced In Sureberus 0.14.0
Print out some diagnostic information when this schema is being applied. The value given to the directive will be included in the output message.
default_registry
Meta Directive
type dict
of str
(setter names) to Python callables
Introduced in Sureberus 0.9.0
This allows you to register functions with a name that can be used in the default_setter
directive of field schemas.
Each key in the directive should be a name, and the value should be a Python function that acts like a default_setter
function.
Then you can pass the name of the registered function to default_setter
to invoke the registered function.
elements
Meta Directive
type Sureberus schema
Introduced in Sureberus 0.9.0
Apply the given schema to each element in a list or other iterable.
fields
Meta Directive
type dict
of keys to Sureberus schemas
Introduced in Sureberus 0.9.0
When applying a schema with fields
to a dictionary, each key in the value is looked up in the fields
directive,
and used to find a Sureberus schema to apply to the value associated with that key in the dictionary being validated.
Each value is a Sureberus schema that can have a few extra directives, specific to dict fields.
rename
: (string) If this is specified, then the dict key will be renamed to the specified key in the result.required
: (bool
) Indicates whether the field must be present.excludes
: (list of strings
) Specifies a list of keys which must not exist on the dictionary for this schema to validate.default
: (object) A value to associate with the key in the resulting dict if the key was not present in the input. If you want to default a field to an empty list or dict, do not usedefault: []
. Instead usedefault_setter: "list"
.default_copy
: (object) A value to use as a default if the key is missing, just likedefault
. The difference is that this directive causes a deep copy to be made each time it's inserted into a document, so it's safe to use values like[]
and{}
.default_setter
: (Python callable of(dict) -> value
, OR a string) A Python function to call if the key was not present in the input. It is passed the dictionary, and its return value will be used as the default. If default_setter is given a string, then it will be used to look up a setter that has been registered withdefault_registry
. By default, you can pass"list"
,"dict"
, or"set"
to set the default to empty lists, dicts, and sets.
keyschema
Meta Directive
type Sureberus schema
Specify a schema to be applied to all keys in a dictionary.
max
Validation Directive
type Number (or anything that supports the comparison operators)
Raises an exception if the value is greater than the given number.
See also min
.
maxlength
Validation Directive
type Number
Raises an exception if the length of the value is greater than the given number.
minlength
Validation Directive
type Number
Introduced in Sureberus 0.14.0
Raises an exception if the length of the value is less than the given number.
metadata
Meta Directive
type dict
Introduced in Sureberus 0.13.0
This directive is unused by Sureberus. It is meant for embedding application-specific metadata in a Sureberus schema.
min
Validation Directive
type Number (or anything that supports the comparison operators)
Raises an exception if the value is less than the given number.
modify_context
Meta Directive
type Python callable (value, Context) -> Context
Introduced in Sureberus 0.8.0
Run a Python function to allow it to modify the current Context.
The Python function will be passed the value and the current Context, and must return a new Context.
This is most often used to call context.set_tag(key, value)
to add a new tag to the Context,
to later be used with choose_schema
.
See Dynamically selecting schemas for more information.
modify_context_registry
Meta Directive
type dict
of str
(modify_context names) to Python callables
Introduced in Sureberus 0.9.0
This allows you to register functions with a name that can be used in the modify_context
directive.
Each key in the directive should be a name, and the value should be a Python function that acts like a modify_context
function.
Then you can pass the name of the registered function to modify_context
to invoke the registered function.
nullable
Validation Directive
type bool
Specifically allows None, even if it would conflict with other validation directives. If the value is None, no other directives are applied.
This directive slightly differs Cerberus's implementation, which doesn't honor nullable
when a *of
directive is present.
See cerberus#373.
regex
Validation Directive
type string (a regex)
If the value is a string, and it does not match the given regex, an exception will be raised. The regex must match the entire string, from beginning to end.
In the future, applying the regex
directive to non-strings will be deprecated.
registry
Meta Directive
type dict
of schema names (strings) to Sureberus schemas
Registers named Sureberus schemas that can be referred to anywhere inside this schema.
This can be useful simply for factoring and schema reuse, but also enables recursive schemas.
To use a registered schema, simply put its name (as a string) any place where you would otherwise have a Sureberus schema.
schema_ref
can also be useful for invoking registered schemas in certain situations.
See Schema registries for more information.
See also the schema_ref directive.
schema_ref
Meta Directive
type string (naming a registered schema)
Applies the named schema (defined in a registry) to the current value. This can be useful if you want to register a schema and use it at the same "level". Most of the time you don't need this, and instead just refer to the named schema by putting the schema name (as a string) anywhere you would normally specify a Sureberus schema.
schema_ref
can also be used as an "inheritance" mechanism: the referred-to schema will be merged in to the schema that has the schema_ref
directive, with the schema_ref
schema taking a lower precedence.
As of Sureberus 0.10, Fields defined in a fields
directive are also merged together. For example:
This schema is equivalent to one that defines both common_field
and field
in the same fields
directive.
See Schema registries for more information.
schema
Meta Directive
type Varies
The meaning of a schema
key inside a schema changes based on the type of the value. This is strange, but it's how Cerberus did things.
It's much better to use either the fields
directive for dicts, or the elements
directive for lists.
When the value is a list, the directive is interpreted as a Sureberus schema to apply to each element of the list.
When the value is a dict, the keys of the dict are looked up in the directive, and used to find a Sureberus schema to apply to the associated value.
The weird thing is that, e.g., it is possible to define a schema like {'schema': {'type': 'integer'}}
,
without a type
specified along with the schema, so you can try to apply it to lists or dicts.
Since we check the value at runtime, if it is a list, it validates each element of the list with that sub-schema.
If it is a dict, it tries to apply the schema directly as the field-schema, which leads to a runtime error when it tries to interpret the string integer
as a Sureberus schema!
While Sureberus tried to match Cerberus bug-for-bug, this behavior (and the naming of the schema
directive) is just too strange.
This is why Sureberus has introduced fields
and elements
directives. Please use those instead.
set_tag
Meta Directive
type dict
or string (described below)
Introduced in Sureberus 0.8.0
Set a tag on the context. This directive can take various forms:
-
"set_tag": {"tag_name": "my-tag", "key": "foo"}
This sets the tag named
my-tag
with the value ofvalue["foo"]
. So it assumes that the value that the schema is being applied to is a dict. -
"set_tag": "foo"
This sets the tag named
foo
with the value ofvalue["foo"]
. It's a shorthand for{"tag_name": "foo", "key": "foo"}
. -
"set_tag": {"tag_name": "my-tag", "value": "bar"}
This sets the tag named
my-tag
with a value of"bar"
-- that is, a hardcoded value specified in the schema. This is very rarely useful, but is a convenient shorthand if you are referring to a schema that relies on a tag, in a context where the tag doesn't vary based on anything.
See choose_schema
/when_tag_is
for an example.
type
Validation Directive
type string
Raises an exception if the type of the value does not match the directive.
These are the types available:
{
"none": type(None),
"integer": six.integer_types,
"float": (float,) + six.integer_types,
"number": (float,) + six.integer_types,
"dict": dict,
"set": set,
"list": list,
"string": six.string_types,
"boolean": bool,
}
validator
Validation Directive
type Python callable (field, value, error_func) -> None
, OR a string naming a registered validator.
Invokes a Python function to validate the value.
Or, if the directive is a string, look up the registered validator function to perform coercion.
The function should return None if the value is valid, otherwise it should call
error_func(field, "error message")
.
validator_registry
Meta Directive
type dict
of str
(validator names) to Python callables
Introduced in Sureberus 0.9.0
This allows you to register functions with a name that can be used in the validator
directive.
Each key in the directive should be a name, and the value should be a Python function that acts like a validator
function.
Then you can pass the name of the registered function to validator
to invoke the registered function.
valueschema
Meta Directive
type Sureberus schema
Applies the given Sureberus schema to all values in the dictionary (requires the value to be a dictionary).
Schema registries
Small, reusable "chunks" of schema can be defined in-line in the schema specification, instead of requiring Python code to be written which sets up registries. This allows for easy use of recursive schemas at any point in your schema, or just a way to conveniently reuse some subschema in multiple places. For example, here is a schema that validates any nested list of strings:
{
"registry": {
"nested_list": {
"type": "list",
"elements": {
"anyof": [
{"type": "string"},
"nested_list",
],
}
}
},
"type": "dict",
"fields": {"things": "nested_list"},
}
This will validate data like {"things": ["one", ["two", ["three"]]]}
.
Typically any place you can specify a schema, you can instead specify a string which will be used to find a previously registered schema (references to registered schemas are resolved lexically).
When you need to "merge in" a registered schema, you can use the schema_ref
directive. This can be useful if you want to register a schema and use it at
exactly the same level, for example:
{
"registry": {
"nested_list": {
"type": "list",
"elements": {"anyof": [{"type": "integer"}, "nested_list"]}
}
},
"schema_ref": "nested_list",
}
This will validate data like ["one", ["two", ["three"]]]
.
Dynamically selecting schemas
Sureberus has a directive for selecting schemas to apply based on various aspects of the input value, called choose_schema
. This directive is meant to be passed a dict, which must include a single sub-directive.
Schema selection based on dict keys: when_key_is, when_key_exists
There are two options for selecting a schema based on dict keys.
when_key_is
is for when you have a dictionary that contains something like a"type"
key, whose value lets you identify a specific schema to apply.when_key_exists
is for when you have a dictionary where different keys appear, and the existence of specific keys allows you to choose a schema to apply.
when_key_is
Use this when you have dictionaries that have a fixed key, such as "type"
,
which specifies some specific format to use. For example, if you have data that
can look like this:
{"type": "elephant", "trunk_length": 60}
{"type": "eagle", "wingspan": 50}
Then you would use when_key_is
in your schema like this (in YAML syntax):
type: dict
choose_schema:
when_key_is:
key: "type"
choices:
"elephant":
fields:
"trunk_length": {"type": "integer"}
"eagle":
fields:
"wingspan": {"type": "integer"}
When the value contains a type
key of elephant
, Sureberus will choose the schema that contains trunk_length
.
When the type is eagle
, it will choose the schema containing wingspan
.
when_key_exists
Use this when you have dictionaries where you must choose the schema based on keys that exist in the data exclusively for their type of data. For example, if you have data that can look like this:
{"image_url": "foo.jpg", "width": 30}
{"color": "red"}
Then you would use when_key_exists
, like this (in YAML):
type: dict
choose_schema:
when_key_exists:
"image_url":
fields:
"image_url": {"type": "string"}
"width": {"type": "integer"}
"color":
fields:
"color": {"type": "string"}
Sureberus looks at the keys in the dictionary, and if one of the keys that are listed in choices
are there, it will choose the corresponding schema.
Schema selection based on context
While when_key_is
can work when you need to vary the way an object is validated or transformed
based on a key existing in that same object, sometimes the relationship of the schema specifier
and the content to be varied is not so tightly bound.
For example, let's take a look at the following data:
{
"type": "foo",
"common": {},
"data_service": {
"renderers": [
{"foo_specific": "bar"}
]
}
}
Let's assume that this structure is mostly fixed. We have a type
key in
the top-level dict, but the only part of the schema that we want to vary is inside the
renderers
list. If all we have is when_key_is
, then we need to end up duplicating the whole
data_services
and renderers
schemas inside the choices
directive of the when_key_is
construct.
Sureberus provides a mechanism that allows you to define schemas that vary based on context, even
if that context comes from much higher up in the object. We basically have a way to "remember" the
value of type
, so that it can be used later when applying schemas to values nested arbitrarily
deeply in the object.
There are four directives that provide these mechanisms. For most cases, you only need to care about the first two of them:
set_tag
- save a tag (a key/value pair) in the Context,choose_schema
withwhen_tag_is
- select a schema based on a saved tag found in the Context,modify_context
- run an arbitrary Python function that can manipulate the Context (including the tags),choose_schema
withfunction
- run an arbitrary Python function that can select a schema based on the Context.
The latter two, modify_context
and choose_schema
are generalizations of the first, and they
don't often need to be used.
Here's an example of a schema that can parse our sample data, using the Python schema syntax.
schema = S.Dict(
set_tag="type",
fields={
"type": S.String(),
"common": S.Dict(),
"data_service": S.Dict(
fields={
"renderers": S.List(
elements=S.Dict(
choose_schema=S.when_tag_is(
"type",
{
"foo": S.Dict(fields={"foo_specific": S.String()}),
"bar": S.Dict(fields={"bar_specific": S.Integer()}),
})))})})
Here we're using the set_tag
directive with its shorthand for specifying a tag name that will be equivalent to the name of the key to look up in the dict.
When Sureberus applies this schema to the top-level dict
, it looks for the key named type
, and stores its value in the Context under a tag named type
.
Then, deeper inside this schema, we make use of the choose_schema
directive with the when_tag_is
sub-directive.
We pass the tag name type
here, so it looks up the value associated with the type
tag in the Context,
and uses that to select the corresponding schema defined in the choices passed to when_tag_is
.
Thus, when the top-level dict has "type": "foo"
, Sureberus will ultimately select the schema containing "foo_specific"
.
Python schema syntax
If you want to construct a schema from Python code instead of storing it as JSON or YAML, sureberus provides a more terse syntax for it.
Here's a standard dict-based schema, using an 80-character limit and strict newline/indent-based line wrapping:
myschema = {
'type': 'dict',
'anyof': [
{'fields': {'gradient': {'type': 'string'}}},
{
'fields': {
'image': {'type': 'string'},
'opacity': {'type': 'integer', 'default': 100},
}
},
],
}
And here is a sureberus.schema
-based schema, using the same line-wrapping
rules:
from sureberus.schema import Dict, String, Integer
myschema = Dict(
anyof=[
dict(gradient=String()),
dict(image=String(), opacity=Integer(default=100))
]
)
Differences from Cerberus
Transformation AND validation
Sureberus exists because Cerberus wasn't flexible enough for our use. Most importantly, Cerberus strictly separates transformation (what the Cerberus documentation calls "Normalization") from validation; if you want to transform a document with Cerberus, you can't also make sure it's valid at the same time. This can lead to some surprising limitations.
For example,
from sureberus import normalize_dict
from cerberus import Validator
schema = {
"x": {
"anyof": [
{"type": "dict", "schema": {"y": {"type": "integer", "default": 0}}},
{"type": "integer"},
]
}
}
Here we have a schema that says:
- this is a dict
- whose
x
field can either be- an integer,
- or a dict,
- containing a
y
field which defaults to 0.
- containing a
- whose
Let's try using it with Sureberus.
assert normalize_dict(schema, {"x": {}}) == {"x": {"y": 0}}
assert normalize_dict(schema, {"x": 5}) == {"x": 5}
These assertions run fine. Sureberus tries to normalize the value with each schema in turn, and returns the result of the first one that succeeds.
Now let's try with Cerberus.
v = Validator(schema)
assert v.normalized({"x": {}}) == {"x": {"y": 0}} # This fails!
assert v.normalized({"x": 5}) == {"x": 5}
The first assertion fails, since Cerberus is returning {'x': {}}
-- it seems to be completely disregarding our default
directive. Why is this?
It's actually deeper than that, still. Let's see what happens when we pass something that obviously shouldn't even validate:
# Sureberus:
from sureberus.errors import NoneMatched
with pytest.raises(NoneMatched):
normalize_dict(schema, {"x": "foo"})
# Cerberus:
with pytest.raises(Exception): # This fails!
v.normalized({"x": "foo"})
Cerberus returns the original document without throwing any sort of exception, even though our schema indicates that the x
key must have a value that's either an integer or a dict.
This is expected as per Cerberus's documentation: you have to validate separately from normalization, by using either the validate
method or the normalized
method.
But because it separates these concepts so strictly, and because some directives like anyof
are considered only validation rules and not normalization rules,
it's impossible to express the transformation we want.
Schema Selection
To improve upon the poor error messages that can occur when using "variable schemas" (the oneof
and anyof
directives) in Cerberus,
we've implemented facilities in Sureberus that make it much more clear how to choose schemas, with the choose_schema
directive.
Not only does this make the schema easier to reason about, it makes error messages much nicer: with anyof
, we have to say:
"Sorry, your value didn't match this schema, or that schema, or that schema..."
But with the mechanisms available through choose_schema
, we get to say:
"I know you want to use THIS schema, because you had a field in your dictionary that indicated which schema to use. This is how it doesn't match..."
The choose_schema
facility is documented more thoroughly in Schema selection.
In-line schema registries
In Cerberus, you have to invoke Python code to register schemas.
This means you can't describe a recursive schema without writing custom Python code (as far as I have been able to figure out, anyway).
With Sureberus, you can take advantage of the registry
directive which allows you to declare named schemas.
This means that recursive schemas are easy to define in Sureberus.
See Schema registries for more information.