Sureberus

Sureberus is a data validation and transformation tool that is useful for validating and normalizing "documents" (nested data structures of basic Python data-types). You provide a schema which describes the expected structure of an object (and optionally, various directives that modify that structure), along with a document to validate and transform, and it returns the new version.

Sureberus is a spiritual descendent of Cerberus, more-or-less uses the same schema format. There are some differences, though, which you can read about in Differences from Cerberus.

Directives

This chapter provides a reference of all Sureberus schema directives.

allow_unknown

Validation Directive
type: bool

When True, extra keys in a dictionary are passed through silently.

When False, keys that are found in a dictionary but which aren't specified in a fields schema will cause an error to be raised.

Example
yaml schema
type: dict allow_unknown: true fields: known: {type: integer}
Valid input
{"known": 3, "unknown": 4}
Example
yaml schema
type: dict allow_unknown: false fields: known: {type: integer}
Erroneous input
{"known": 3, "unknown": 4}
Error
<At root: Dict {'known': 3, 'unknown': 4} had unknown fields: {'unknown'}>

allowed

Validation Directive
type: list of arbitrary Python objects

The object being validated must be equal to one of the objects in the list in order to pass validation.

Example
yaml schema
allowed: ["foo", 1, 2, 3]
Valid input
"foo"
Valid input
2
Erroneous input
5
Error
<At root: Value 5 is not allowed. Must be one of ['foo', 1, 2, 3]>

*of (anyof, oneof)

Meta Directive
type list of Sureberus schemas

Try applying schemas in sequence to the current value.

These directives should be avoided, and choose_schema should be strongly preferred, if possible. These directives are generally inefficient and result in hard-to-read error messages.

When anyof is used, then as soon as any schema applies successfully, its result is returned.

When oneof is used, ALL schemas are checked, and if more than one can be applied successfully, an exception is raised (this is very unlikely to be useful, you should probably just use anyof).

In either case, if none of the schemas can be applied without error, then a validation error will be raised.

Unlike Cerberus, these directives allow Transformation Directives to do their work as well. If a schema can be applied successfully, the transformations it applies will be returned.

choose_schema

Meta Directive
type dict described below
Introduced in Sureberus 0.8.0

Choose a schema based on different factors of the input document and the current Context. See Dynamically selecting schemas for more information.

The directive value is a dictionary which must contain one of the following keys.

choose_schema/when_key_is

type dict containing key, choices, and optionally default_choice

Dynamically selects a schema based on the value of a specific key, specified by the key sub-directive. For example, if you have a value like {"type": "foo", "foo_specific": "bar"}, where the foo part determines which other keys might exist in the dict (like foo_specific), then this directive can help you choose a specific schema to validate with.

When this directive is applied, it determines a schema to apply by accessing the key named by the key sub-directive in the value (which we'll call the "choice"). If it's not found, then default_choice is used. It then looks up the schema to use by looking for that "choice" in the choices sub-directive.

Example
yaml schema
choose_schema: when_key_is: key: "chooser" choices: "choice_a": type: dict fields: a_specific: {type: integer} "choice_b": type: dict fields: b_specific: {type: string}
Valid input
{"chooser": "choice_a", "a_specific": 3}
Valid input
{"chooser": "choice_b", "b_specific": "foo"}
Erroneous input
{"chooser": "choice_a", "b_specific": "foo"}
Error
<At root: Dict {'chooser': 'choice_a', 'b_specific': 'foo'} had unknown fields: {'b_specific'}>

choose_schema/when_key_exists

type dict (described below)

Dynamically selects a schema based on whether a certain dict key exists.

The directive should be provided a dictionary, where each key can potentially match a key in the value dictionary. Each value in the directive dictionary should be a Sureberus schema to apply to the dictionary if the key exists in the dictionary.

Example
yaml schema
choose_schema: when_key_exists: "keyA": type: dict fields: keyA: {type: string} a_related: {type: integer} "keyB": type: dict fields: keyB: {type: integer} b_related: {type: string}
Valid input
{"keyA": "a_value", "a_related": 33}
Valid input
{"keyB": 50, "b_related": "hi"}
Erroneous input
{"keyB": 50, "a_related": 33}
Error
<At root: Dict {'keyB': 50, 'a_related': 33} had unknown fields: {'a_related'}>

choose_schema/when_tag_is

type dict containing tag, choices, and optionally default_choice

This is very similar to when_key_is, but instead of choosing a schema based on the value of a dictionary key, it does it by using the context. It goes hand-in-hand with the set_tag or modify_context directives.

When this directive is applied, it determines the schema to apply by looking up a tag named by the tag sub-directive (which we'll call the "choice"). It then looks up the schema to use by looking for that "choice" in the choices sub-directive.

Example
yaml schema
type: dict # this `set_tag` sets the `mytag` key with the value # associated with `obj_type` in the document set_tag: {tag_name: "mytag", key: "obj_type"} fields: obj_type: {type: string} configuration: type: dict fields: config_item: # here we're selecting a schema based on # something that appears higher up in the # document hierarchy. choose_schema: when_tag_is: tag: mytag choices: "choice_a": {type: integer} "choice_b": {type: boolean}
Valid input
{"obj_type": "choice_a", "configuration": {"config_item": 3}}
Valid input
{"obj_type": "choice_b", "configuration": {"config_item": True}}

choose_schema/when_type_is

type dict (described below)
Introduced in Sureberus 0.11

This directive is given a mapping of type names (using the same names that the type directive takes) to schemas. A schema is chosen based on the type of the value.

Example
yaml schema
choose_schema: when_type_is: list: {elements: {type: integer, min: 0}} integer: {type: integer, min: 0}
Valid input
50
Valid input
[50, 60]

choose_schema/function

type Python callable (value, context) -> Sureberus schema

Dynamically choose a schema to use based on the current value and the Context object. The schema returned by the Python function will be applied to the value.

coerce

Transformation Directive
type Python callable (value) -> new value, OR a string naming a registered coerce function

Call a Python function with the value to get a new one to use. Or, if the directive is a string, look up the registered coerce function to perform coercion. By default, you can pass "to_list" or "to_set" to convert the value to a list or set, if the value is not already a list or set, respectively.

It's important to note that this function is called before all other directives that might reject a value. This is a good directive to use if you want to normalize invalid documents to a form that can be considered valid.

Example
python schema
{"type": "integer", "coerce": lambda i: i + 1}
Input
3
Output
4

coerce_post

Transformation Directive
type Python callable (value) -> new value, OR a string naming a registered coerce function

Call a Python function with the value to get a new one to use, after all other validation. Or, if the directive is a string, look up the registered coerce function to perform coercion. By default, you can pass "to_list" or "to_set" to convert the value to a list or set, if the value is not already a list or set, respectively.

Unlike coerce, this function is applied after all other directives, so it's allowed to return values that wouldn't validate according to other directives in your schema.

Example
python schema
{ "type": "integer", # note that this schema does *not* allow None as input, # and yet the coerce_post can produce it as output "coerce_post": lambda i: None if i == 0 else i }
Input
1
Output
1
Input
0
Output
None

coerce_with_context

Transformation Directive
type Python callable (value, Context) -> new value, OR a string naming a registered coerce function
Introduced in Sureberus 0.12.0

Call a Python function with the value and the Context to calculate a replacement. Or, if the directive is a string, look up the registered coerce function to perform coercion.

This can be used in tandem with set_tag or modify_context to pass data to transformers that are run on deeper parts of the document. The function can access tags stored in the context with the Context.get_tag(tag_name) method.

coerce_post_with_context

Transformation Directive
type Python callable (value, Context) -> new value, OR a string naming a registered coerce function
Introduced in Sureberus 0.12.0

Identical to coerce, but runs after all validation.

coerce_registry

Meta Directive
type dict of str (coerce names) to Python callables
Introduced in Sureberus 0.9.0

This allows you to register functions with a name that can be used in the coerce and coerce_post directives. Each key in the directive should be a name, and the value should be a Python function that takes a single argument and returns a new value, just like the functions you would normally pass to coerce. Then you can pass the name of the registered function to coerce or coerce_post to invoke the registered function.

debug

Meta Directive
type str
Introduced In Sureberus 0.14.0

Print out some diagnostic information when this schema is being applied. The value given to the directive will be included in the output message.

default_registry

Meta Directive
type dict of str (setter names) to Python callables
Introduced in Sureberus 0.9.0

This allows you to register functions with a name that can be used in the default_setter directive of field schemas. Each key in the directive should be a name, and the value should be a Python function that acts like a default_setter function. Then you can pass the name of the registered function to default_setter to invoke the registered function.

elements

Meta Directive
type Sureberus schema
Introduced in Sureberus 0.9.0

Apply the given schema to each element in a list or other iterable.

Example
yaml schema
type: list elements: {type: integer}
Valid input
[50, 60]
Valid input
[]
Erroneous input
[50, "hello"]
Error
<At root[1]: 'hello' must be of integer type>

fields

Meta Directive
type dict of keys to Sureberus schemas
Introduced in Sureberus 0.9.0

When applying a schema with fields to a dictionary, each key in the value is looked up in the fields directive, and used to find a Sureberus schema to apply to the value associated with that key in the dictionary being validated.

Each value is a Sureberus schema that can have a few extra directives, specific to dict fields.

  • rename: (string) If this is specified, then the dict key will be renamed to the specified key in the result.
  • required: (bool) Indicates whether the field must be present.
  • excludes: (list of strings) Specifies a list of keys which must not exist on the dictionary for this schema to validate.
  • default: (object) A value to associate with the key in the resulting dict if the key was not present in the input. If you want to default a field to an empty list or dict, do not use default: []. Instead use default_setter: "list".
  • default_copy: (object) A value to use as a default if the key is missing, just like default. The difference is that this directive causes a deep copy to be made each time it's inserted into a document, so it's safe to use values like [] and {}.
  • default_setter: (Python callable of (dict) -> value, OR a string) A Python function to call if the key was not present in the input. It is passed the dictionary, and its return value will be used as the default. If default_setter is given a string, then it will be used to look up a setter that has been registered with default_registry. By default, you can pass "list", "dict", or "set" to set the default to empty lists, dicts, and sets.
Example
yaml schema
type: dict fields: field1: {type: integer} field2: {type: string}
Valid input
{"field1": 42, "field2": "nice"}
Valid input
{}

keyschema

Meta Directive
type Sureberus schema

Specify a schema to be applied to all keys in a dictionary.

Example
yaml schema
type: dict keyschema: {type: integer}
Valid input
{42: "hello", -500: None}
Erroneous input
{"hello": 42}
Error
<At root['hello']: 'hello' must be of integer type>

max

Validation Directive
type Number (or anything that supports the comparison operators)

Raises an exception if the value is greater than the given number.

Example
yaml schema
type: integer max: 50
Valid input
50
Erroneous input
51
Error
<At root: Number 51 is out of bounds, must be at least None and at most 50>

See also min.

maxlength

Validation Directive
type Number

Raises an exception if the length of the value is greater than the given number.

Example
yaml schema
maxlength: 2
Erroneous input
[1,2,3]
Error
<At root: Value [1, 2, 3] is greater than max length of 2>
Erroneous input
"abcdef"
Error
<At root: Value 'abcdef' is greater than max length of 2>

minlength

Validation Directive
type Number Introduced in Sureberus 0.14.0

Raises an exception if the length of the value is less than the given number.

Example
yaml schema
minlength: 10
Erroneous input
[1,2,3]
Error
<At root: Value [1, 2, 3] is less than min length of 10>
Erroneous input
"abcdef"
Error
<At root: Value 'abcdef' is less than min length of 10>

metadata

Meta Directive
type dict
Introduced in Sureberus 0.13.0

This directive is unused by Sureberus. It is meant for embedding application-specific metadata in a Sureberus schema.

min

Validation Directive
type Number (or anything that supports the comparison operators)

Raises an exception if the value is less than the given number.

Example
yaml schema
type: integer min: -1
Valid input
-1
Erroneous input
-2
Error
<At root: Number -2 is out of bounds, must be at least -1 and at most None>

modify_context

Meta Directive
type Python callable (value, Context) -> Context
Introduced in Sureberus 0.8.0

Run a Python function to allow it to modify the current Context. The Python function will be passed the value and the current Context, and must return a new Context. This is most often used to call context.set_tag(key, value) to add a new tag to the Context, to later be used with choose_schema.

See Dynamically selecting schemas for more information.

modify_context_registry

Meta Directive
type dict of str (modify_context names) to Python callables
Introduced in Sureberus 0.9.0

This allows you to register functions with a name that can be used in the modify_context directive. Each key in the directive should be a name, and the value should be a Python function that acts like a modify_context function. Then you can pass the name of the registered function to modify_context to invoke the registered function.

nullable

Validation Directive
type bool

Specifically allows None, even if it would conflict with other validation directives. If the value is None, no other directives are applied.

This directive slightly differs Cerberus's implementation, which doesn't honor nullable when a *of directive is present. See cerberus#373.

Example
yaml schema
type: integer nullable: true
Valid input
None

regex

Validation Directive
type string (a regex)

If the value is a string, and it does not match the given regex, an exception will be raised. The regex must match the entire string, from beginning to end.

In the future, applying the regex directive to non-strings will be deprecated.

Example
yaml schema
regex: "[a-z]+"
Valid input
"foobar"
Erroneous input
"Foobar"
Error
<At root: value does not match regex 'Foobar' '[a-z]+'>
Valid input
3

registry

Meta Directive
type dict of schema names (strings) to Sureberus schemas

Registers named Sureberus schemas that can be referred to anywhere inside this schema. This can be useful simply for factoring and schema reuse, but also enables recursive schemas. To use a registered schema, simply put its name (as a string) any place where you would otherwise have a Sureberus schema. schema_ref can also be useful for invoking registered schemas in certain situations.

See Schema registries for more information.

See also the schema_ref directive.

Example
yaml schema
registry: reusable_schema: type: integer min: 0 max: 500 type: dict fields: num1: reusable_schema num2: reusable_schema
Valid input
{"num1": 0, "num2": 30}
Example
yaml schema
registry: recursive_ints: choose_schema: when_type_is: list: {elements: recursive_ints} integer: {} schema_ref: recursive_ints
Valid input
[]
Valid input
[1, 2]
Valid input
[1, [2, [3, 4]]]

schema_ref

Meta Directive
type string (naming a registered schema)

Applies the named schema (defined in a registry) to the current value. This can be useful if you want to register a schema and use it at the same "level". Most of the time you don't need this, and instead just refer to the named schema by putting the schema name (as a string) anywhere you would normally specify a Sureberus schema.

schema_ref can also be used as an "inheritance" mechanism: the referred-to schema will be merged in to the schema that has the schema_ref directive, with the schema_ref schema taking a lower precedence. As of Sureberus 0.10, Fields defined in a fields directive are also merged together. For example:

Example
yaml schema
registry: "common": type: dict fields: "common_field": {"type": "string"} type: dict schema_ref: "common" allow_unknown: false fields: "extra_field": {"type": "string"}
Valid input
{"common_field": "foo", "extra_field": "bar"}

This schema is equivalent to one that defines both common_field and field in the same fields directive.

See Schema registries for more information.

schema

Meta Directive
type Varies

The meaning of a schema key inside a schema changes based on the type of the value. This is strange, but it's how Cerberus did things. It's much better to use either the fields directive for dicts, or the elements directive for lists.

When the value is a list, the directive is interpreted as a Sureberus schema to apply to each element of the list.

When the value is a dict, the keys of the dict are looked up in the directive, and used to find a Sureberus schema to apply to the associated value.

The weird thing is that, e.g., it is possible to define a schema like {'schema': {'type': 'integer'}}, without a type specified along with the schema, so you can try to apply it to lists or dicts. Since we check the value at runtime, if it is a list, it validates each element of the list with that sub-schema. If it is a dict, it tries to apply the schema directly as the field-schema, which leads to a runtime error when it tries to interpret the string integer as a Sureberus schema!

While Sureberus tried to match Cerberus bug-for-bug, this behavior (and the naming of the schema directive) is just too strange. This is why Sureberus has introduced fields and elements directives. Please use those instead.

set_tag

Meta Directive
type dict or string (described below)
Introduced in Sureberus 0.8.0

Set a tag on the context. This directive can take various forms:

  • "set_tag": {"tag_name": "my-tag", "key": "foo"}

    This sets the tag named my-tag with the value of value["foo"]. So it assumes that the value that the schema is being applied to is a dict.

  • "set_tag": "foo"

    This sets the tag named foo with the value of value["foo"]. It's a shorthand for {"tag_name": "foo", "key": "foo"}.

  • "set_tag": {"tag_name": "my-tag", "value": "bar"}

    This sets the tag named my-tag with a value of "bar" -- that is, a hardcoded value specified in the schema. This is very rarely useful, but is a convenient shorthand if you are referring to a schema that relies on a tag, in a context where the tag doesn't vary based on anything.

See choose_schema/when_tag_is for an example.

type

Validation Directive
type string

Raises an exception if the type of the value does not match the directive.

These are the types available:

{
    "none": type(None),
    "integer": six.integer_types,
    "float": (float,) + six.integer_types,
    "number": (float,) + six.integer_types,
    "dict": dict,
    "set": set,
    "list": list,
    "string": six.string_types,
    "boolean": bool,
}
Example
yaml schema
{type: integer}
Valid input
3
Erroneous input
"3"
Error
<At root: '3' must be of integer type>

validator

Validation Directive
type Python callable (field, value, error_func) -> None, OR a string naming a registered validator.

Invokes a Python function to validate the value. Or, if the directive is a string, look up the registered validator function to perform coercion. The function should return None if the value is valid, otherwise it should call error_func(field, "error message").

validator_registry

Meta Directive
type dict of str (validator names) to Python callables
Introduced in Sureberus 0.9.0

This allows you to register functions with a name that can be used in the validator directive. Each key in the directive should be a name, and the value should be a Python function that acts like a validator function. Then you can pass the name of the registered function to validator to invoke the registered function.

valueschema

Meta Directive
type Sureberus schema

Applies the given Sureberus schema to all values in the dictionary (requires the value to be a dictionary).

Example
yaml schema
type: dict valueschema: {type: integer}
Valid input
{"foo": 3, "bar": 5}
Erroneous input
{"foo": "3"}
Error
<At root['foo']: '3' must be of integer type>

Schema registries

Small, reusable "chunks" of schema can be defined in-line in the schema specification, instead of requiring Python code to be written which sets up registries. This allows for easy use of recursive schemas at any point in your schema, or just a way to conveniently reuse some subschema in multiple places. For example, here is a schema that validates any nested list of strings:

{
    "registry": {
        "nested_list": {
            "type": "list",
            "elements": {
                "anyof": [
                    {"type": "string"},
                    "nested_list",
                ],
            }
        }
    },
    "type": "dict",
    "fields": {"things": "nested_list"},
}

This will validate data like {"things": ["one", ["two", ["three"]]]}.

Typically any place you can specify a schema, you can instead specify a string which will be used to find a previously registered schema (references to registered schemas are resolved lexically).

When you need to "merge in" a registered schema, you can use the schema_ref directive. This can be useful if you want to register a schema and use it at exactly the same level, for example:

{
    "registry": {
        "nested_list": {
            "type": "list",
            "elements": {"anyof": [{"type": "integer"}, "nested_list"]}
        }
    },
    "schema_ref": "nested_list",
}

This will validate data like ["one", ["two", ["three"]]].

Dynamically selecting schemas

Sureberus has a directive for selecting schemas to apply based on various aspects of the input value, called choose_schema. This directive is meant to be passed a dict, which must include a single sub-directive.

Schema selection based on dict keys: when_key_is, when_key_exists

There are two options for selecting a schema based on dict keys.

  • when_key_is is for when you have a dictionary that contains something like a "type" key, whose value lets you identify a specific schema to apply.
  • when_key_exists is for when you have a dictionary where different keys appear, and the existence of specific keys allows you to choose a schema to apply.

when_key_is

Use this when you have dictionaries that have a fixed key, such as "type", which specifies some specific format to use. For example, if you have data that can look like this:

{"type": "elephant", "trunk_length": 60}
{"type": "eagle", "wingspan": 50}

Then you would use when_key_is in your schema like this (in YAML syntax):

type: dict
choose_schema:
  when_key_is:
    key: "type"
    choices:
      "elephant":
        fields:
          "trunk_length": {"type": "integer"}
      "eagle":
        fields:
          "wingspan": {"type": "integer"}

When the value contains a type key of elephant, Sureberus will choose the schema that contains trunk_length. When the type is eagle, it will choose the schema containing wingspan.

when_key_exists

Use this when you have dictionaries where you must choose the schema based on keys that exist in the data exclusively for their type of data. For example, if you have data that can look like this:

{"image_url": "foo.jpg", "width": 30}
{"color": "red"}

Then you would use when_key_exists, like this (in YAML):

type: dict
choose_schema:
  when_key_exists:
    "image_url":
      fields:
        "image_url": {"type": "string"}
        "width": {"type": "integer"}
    "color":
      fields:
        "color": {"type": "string"}

Sureberus looks at the keys in the dictionary, and if one of the keys that are listed in choices are there, it will choose the corresponding schema.

Schema selection based on context

While when_key_is can work when you need to vary the way an object is validated or transformed based on a key existing in that same object, sometimes the relationship of the schema specifier and the content to be varied is not so tightly bound.

For example, let's take a look at the following data:

{
    "type": "foo",
    "common": {},
    "data_service": {
        "renderers": [
            {"foo_specific": "bar"}
        ]
    }
}

Let's assume that this structure is mostly fixed. We have a type key in the top-level dict, but the only part of the schema that we want to vary is inside the renderers list. If all we have is when_key_is, then we need to end up duplicating the whole data_services and renderers schemas inside the choices directive of the when_key_is construct.

Sureberus provides a mechanism that allows you to define schemas that vary based on context, even if that context comes from much higher up in the object. We basically have a way to "remember" the value of type, so that it can be used later when applying schemas to values nested arbitrarily deeply in the object.

There are four directives that provide these mechanisms. For most cases, you only need to care about the first two of them:

  • set_tag - save a tag (a key/value pair) in the Context,
  • choose_schema with when_tag_is - select a schema based on a saved tag found in the Context,
  • modify_context - run an arbitrary Python function that can manipulate the Context (including the tags),
  • choose_schema with function - run an arbitrary Python function that can select a schema based on the Context.

The latter two, modify_context and choose_schema are generalizations of the first, and they don't often need to be used.

Here's an example of a schema that can parse our sample data, using the Python schema syntax.

schema = S.Dict(
  set_tag="type",
  fields={
    "type": S.String(),
    "common": S.Dict(),
    "data_service": S.Dict(
      fields={
        "renderers": S.List(
          elements=S.Dict(
            choose_schema=S.when_tag_is(
              "type",
              {
                "foo": S.Dict(fields={"foo_specific": S.String()}),
                "bar": S.Dict(fields={"bar_specific": S.Integer()}),
              })))})})

Here we're using the set_tag directive with its shorthand for specifying a tag name that will be equivalent to the name of the key to look up in the dict. When Sureberus applies this schema to the top-level dict, it looks for the key named type, and stores its value in the Context under a tag named type. Then, deeper inside this schema, we make use of the choose_schema directive with the when_tag_is sub-directive. We pass the tag name type here, so it looks up the value associated with the type tag in the Context, and uses that to select the corresponding schema defined in the choices passed to when_tag_is. Thus, when the top-level dict has "type": "foo", Sureberus will ultimately select the schema containing "foo_specific".

Python schema syntax

If you want to construct a schema from Python code instead of storing it as JSON or YAML, sureberus provides a more terse syntax for it.

Here's a standard dict-based schema, using an 80-character limit and strict newline/indent-based line wrapping:

myschema = {
    'type': 'dict',
    'anyof': [
        {'fields': {'gradient': {'type': 'string'}}},
        {
            'fields': {
                'image': {'type': 'string'},
                'opacity': {'type': 'integer', 'default': 100},
            }
        },
    ],
}

And here is a sureberus.schema-based schema, using the same line-wrapping rules:

from sureberus.schema import Dict, String, Integer
myschema = Dict(
    anyof=[
        dict(gradient=String()),
        dict(image=String(), opacity=Integer(default=100))
    ]
)

Differences from Cerberus

Transformation AND validation

Sureberus exists because Cerberus wasn't flexible enough for our use. Most importantly, Cerberus strictly separates transformation (what the Cerberus documentation calls "Normalization") from validation; if you want to transform a document with Cerberus, you can't also make sure it's valid at the same time. This can lead to some surprising limitations.

For example,

from sureberus import normalize_dict
from cerberus import Validator

schema = {
    "x": {
        "anyof": [
            {"type": "dict", "schema": {"y": {"type": "integer", "default": 0}}},
            {"type": "integer"},
        ]
    }
}

Here we have a schema that says:

  • this is a dict
    • whose x field can either be
      • an integer,
      • or a dict,
        • containing a y field which defaults to 0.

Let's try using it with Sureberus.

assert normalize_dict(schema, {"x": {}}) == {"x": {"y": 0}}
assert normalize_dict(schema, {"x": 5}) == {"x": 5}

These assertions run fine. Sureberus tries to normalize the value with each schema in turn, and returns the result of the first one that succeeds.

Now let's try with Cerberus.

v = Validator(schema)
assert v.normalized({"x": {}}) == {"x": {"y": 0}} # This fails!
assert v.normalized({"x": 5}) == {"x": 5}

The first assertion fails, since Cerberus is returning {'x': {}} -- it seems to be completely disregarding our default directive. Why is this?

It's actually deeper than that, still. Let's see what happens when we pass something that obviously shouldn't even validate:


# Sureberus:
from sureberus.errors import NoneMatched
with pytest.raises(NoneMatched):
    normalize_dict(schema, {"x": "foo"})

# Cerberus:
with pytest.raises(Exception): # This fails!
    v.normalized({"x": "foo"})

Cerberus returns the original document without throwing any sort of exception, even though our schema indicates that the x key must have a value that's either an integer or a dict. This is expected as per Cerberus's documentation: you have to validate separately from normalization, by using either the validate method or the normalized method. But because it separates these concepts so strictly, and because some directives like anyof are considered only validation rules and not normalization rules, it's impossible to express the transformation we want.

Schema Selection

To improve upon the poor error messages that can occur when using "variable schemas" (the oneof and anyof directives) in Cerberus, we've implemented facilities in Sureberus that make it much more clear how to choose schemas, with the choose_schema directive.

Not only does this make the schema easier to reason about, it makes error messages much nicer: with anyof, we have to say:

"Sorry, your value didn't match this schema, or that schema, or that schema..."

But with the mechanisms available through choose_schema, we get to say:

"I know you want to use THIS schema, because you had a field in your dictionary that indicated which schema to use. This is how it doesn't match..."

The choose_schema facility is documented more thoroughly in Schema selection.

In-line schema registries

In Cerberus, you have to invoke Python code to register schemas. This means you can't describe a recursive schema without writing custom Python code (as far as I have been able to figure out, anyway). With Sureberus, you can take advantage of the registry directive which allows you to declare named schemas. This means that recursive schemas are easy to define in Sureberus. See Schema registries for more information.