Skip to content

json

A simple class for flexible schema definition and usage.

JSONSchema

Bases: Schema[JSONType, JSON_Schema_T, JSON_blob_T]

A flexible mixin Schema class for easy definition of flexible, readable schemas.

To use this class, initiate a subclass with the desired fields as dataclass fields. Fields will be re-mapped to PyArrow types via the PYTHON_TO_PYARROW dictionary. The resulting object can then be used to validate and reformat PyArrow tables to a validated form, or used for type-safe dictionary-like usage of data conforming to the schema.

Examples:

>>> class Data(JSONSchema):
...     allow_extra_columns: ClassVar[bool] = True
...     subject_id: int
...     time: datetime
...     code: str
...     numeric_value: float | None = None
...     text_value: str | None = None

Once defined, you can access the schema’s columns and their types via prescribed member variables:

>>> Data.subject_id_name
'subject_id'
>>> Data.subject_id_dtype
{'type': 'integer'}
>>> Data.time_name
'time'
>>> Data.time_dtype
{'type': 'string', 'format': 'date-time'}

You can also produce a JSON schema for the class:

>>> Data.schema() # doctest: +NORMALIZE_WHITESPACE
{'type': 'object',
 'properties': {'subject_id': {'type': 'integer'},
                'time': {'type': 'string', 'format': 'date-time'},
                'code': {'type': 'string'},
                'numeric_value': {'type': 'number'},
                'text_value': {'type': 'string'}},
 'required': ['subject_id', 'time', 'code'],
 'additionalProperties': True}
>>> try:
...     Draft202012Validator.check_schema(Data.schema())
...     print("Returned schema is valid!")
... except Exception as e:
...     print(f"Returned schema is invalid")
...     raise e
Returned schema is valid!

You can also validate that a query schema is valid against this schema with the validate method. This method accounts for optional column type specification and the open-ness or closed-ness of the schema (e.g., does it allow extra columns):

>>> query_schema = {
...     "type": "object",
...     "properties": {
...         "subject_id": {"type": "integer"},
...         "time": {"type": "string", "format": "date-time"},
...         "code": {"type": "string"},
...         "foobar": {"type": "string"},
...     },
...     "required": ["subject_id", "time", "code"],
... }
>>> try:
...     Data.validate(query_schema)
...     print("Schema is valid")
... except Exception as e:
...     print(f"Schema is invalid")
...     raise e
Schema is valid
>>> Data.allow_extra_columns = False
>>> Data.validate(query_schema)
Traceback (most recent call last):
    ...
flexible_schema.exceptions.SchemaValidationError: Disallowed extra columns: foobar
>>> query_schema = {
...     "type": "object",
...     "properties": {
...         "subject_id": {"type": "integer"},
...         "time": {"type": "string", "format": "date-time"},
...         "code": {"type": "string"},
...         "numeric_value": {"type": "string"},
...     },
... }
>>> Data.validate(query_schema)
Traceback (most recent call last):
    ...
flexible_schema.exceptions.SchemaValidationError:
    Columns with incorrect types: numeric_value (want {'type': 'number'}, got {'type': 'string'})
>>> query_schema = {
...     "type": "object",
...     "properties": {
...         "subject_id": {"type": "integer"},
...         "time": {"type": "string", "format": "date-time"},
...         "numeric_value": {"type": "number"},
...     },
... }
>>> Data.validate(query_schema)
Traceback (most recent call last):
    ...
flexible_schema.exceptions.SchemaValidationError: Missing required columns: code

You can also validate against a JSON blob:

>>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A"})
>>> Data.allow_extra_columns = True
>>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A", "extra": "extra"})
>>> Data.allow_extra_columns = False
>>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A", "extra": "extra"})
Traceback (most recent call last):
    ...
flexible_schema.exceptions.TableValidationError: Table validation failed

Validation will fail if the passed object is neither a table or a schema:

>>> Data.validate("foobar")
Traceback (most recent call last):
    ...
TypeError: Expected a schema or table, but got: str

Alignment is not supported in JSONSchema:

>>> Data.align({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A"})
Traceback (most recent call last):
    ...
NotImplementedError: JSONSchema does not support alignment

You can also use this class as a dataclass for type-safe usage of data conforming to this schema:

>>> Data(subject_id=1, time=datetime(2023, 10, 1), code="A")
Data(subject_id=1,
     time=datetime.datetime(2023, 10, 1, 0, 0),
     code='A',
     numeric_value=None,
     text_value=None)
Source code in flexible_schema/json.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
class JSONSchema(Schema[JSONType, JSON_Schema_T, JSON_blob_T]):
    """A flexible mixin Schema class for easy definition of flexible, readable schemas.

    To use this class, initiate a subclass with the desired fields as dataclass fields. Fields will be
    re-mapped to PyArrow types via the `PYTHON_TO_PYARROW` dictionary. The resulting object can then be used
    to validate and reformat PyArrow tables to a validated form, or used for type-safe dictionary-like usage
    of data conforming to the schema.

    Examples:
        >>> class Data(JSONSchema):
        ...     allow_extra_columns: ClassVar[bool] = True
        ...     subject_id: int
        ...     time: datetime
        ...     code: str
        ...     numeric_value: float | None = None
        ...     text_value: str | None = None

    Once defined, you can access the schema's columns and their types via prescribed member variables:

        >>> Data.subject_id_name
        'subject_id'
        >>> Data.subject_id_dtype
        {'type': 'integer'}
        >>> Data.time_name
        'time'
        >>> Data.time_dtype
        {'type': 'string', 'format': 'date-time'}

    You can also produce a JSON schema for the class:

        >>> Data.schema() # doctest: +NORMALIZE_WHITESPACE
        {'type': 'object',
         'properties': {'subject_id': {'type': 'integer'},
                        'time': {'type': 'string', 'format': 'date-time'},
                        'code': {'type': 'string'},
                        'numeric_value': {'type': 'number'},
                        'text_value': {'type': 'string'}},
         'required': ['subject_id', 'time', 'code'],
         'additionalProperties': True}
        >>> try:
        ...     Draft202012Validator.check_schema(Data.schema())
        ...     print("Returned schema is valid!")
        ... except Exception as e:
        ...     print(f"Returned schema is invalid")
        ...     raise e
        Returned schema is valid!

    You can also validate that a query schema is valid against this schema with the `validate` method. This
    method accounts for optional column type specification and the open-ness or closed-ness of the schema
    (e.g., does it allow extra columns):

        >>> query_schema = {
        ...     "type": "object",
        ...     "properties": {
        ...         "subject_id": {"type": "integer"},
        ...         "time": {"type": "string", "format": "date-time"},
        ...         "code": {"type": "string"},
        ...         "foobar": {"type": "string"},
        ...     },
        ...     "required": ["subject_id", "time", "code"],
        ... }
        >>> try:
        ...     Data.validate(query_schema)
        ...     print("Schema is valid")
        ... except Exception as e:
        ...     print(f"Schema is invalid")
        ...     raise e
        Schema is valid
        >>> Data.allow_extra_columns = False
        >>> Data.validate(query_schema)
        Traceback (most recent call last):
            ...
        flexible_schema.exceptions.SchemaValidationError: Disallowed extra columns: foobar
        >>> query_schema = {
        ...     "type": "object",
        ...     "properties": {
        ...         "subject_id": {"type": "integer"},
        ...         "time": {"type": "string", "format": "date-time"},
        ...         "code": {"type": "string"},
        ...         "numeric_value": {"type": "string"},
        ...     },
        ... }
        >>> Data.validate(query_schema)
        Traceback (most recent call last):
            ...
        flexible_schema.exceptions.SchemaValidationError:
            Columns with incorrect types: numeric_value (want {'type': 'number'}, got {'type': 'string'})
        >>> query_schema = {
        ...     "type": "object",
        ...     "properties": {
        ...         "subject_id": {"type": "integer"},
        ...         "time": {"type": "string", "format": "date-time"},
        ...         "numeric_value": {"type": "number"},
        ...     },
        ... }
        >>> Data.validate(query_schema)
        Traceback (most recent call last):
            ...
        flexible_schema.exceptions.SchemaValidationError: Missing required columns: code

    You can also validate against a JSON blob:

        >>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A"})
        >>> Data.allow_extra_columns = True
        >>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A", "extra": "extra"})
        >>> Data.allow_extra_columns = False
        >>> Data.validate({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A", "extra": "extra"})
        Traceback (most recent call last):
            ...
        flexible_schema.exceptions.TableValidationError: Table validation failed

    Validation will fail if the passed object is neither a table or a schema:

        >>> Data.validate("foobar")
        Traceback (most recent call last):
            ...
        TypeError: Expected a schema or table, but got: str

    Alignment is not supported in JSONSchema:

        >>> Data.align({"subject_id": 1, "time": "2023-10-01T00:00:00", "code": "A"})
        Traceback (most recent call last):
            ...
        NotImplementedError: JSONSchema does not support alignment

    You can also use this class as a dataclass for type-safe usage of data conforming to this schema:

        >>> Data(subject_id=1, time=datetime(2023, 10, 1), code="A")
        Data(subject_id=1,
             time=datetime.datetime(2023, 10, 1, 0, 0),
             code='A',
             numeric_value=None,
             text_value=None)
    """

    PYTHON_TO_JSON: ClassVar[dict[Any, str]] = {
        int: "integer",
        float: "number",
        str: "string",
        bool: "boolean",
    }

    @classmethod
    def map_type(cls, field_type: Any) -> JSONType:
        """Map a Python type to a JSON schema type.

        Args:
            field_type: The Python type to map.

        Returns:
            The JSON schema type, in string form.

        Raises:
            ValueError: If the type is not supported.

        Examples:
            >>> JSONSchema.map_type(int)
            {'type': 'integer'}
            >>> JSONSchema.map_type(list[float])
            {'type': 'array', 'items': {'type': 'number'}}
            >>> JSONSchema.map_type(str)
            {'type': 'string'}
            >>> JSONSchema.map_type(list[datetime])
            {'type': 'array', 'items': {'type': 'string', 'format': 'date-time'}}
            >>> JSONSchema.map_type("integer")
            {'type': 'integer'}
            >>> JSONSchema.map_type((int, str))
            Traceback (most recent call last):
                ...
            ValueError: Unsupported type: (<class 'int'>, <class 'str'>)
        """

        origin = get_origin(field_type)

        if origin is list:
            args = get_args(field_type)
            return {"type": "array", "items": cls.map_type(args[0])}
        elif field_type is datetime or origin is datetime:
            return {"type": "string", "format": "date-time"}
        elif field_type in cls.PYTHON_TO_JSON:
            return {"type": cls.PYTHON_TO_JSON[field_type]}
        elif isinstance(field_type, str):
            return {"type": field_type}
        else:
            raise ValueError(f"Unsupported type: {field_type}")

    @classmethod
    def _inv_map_type(cls, json_type: JSONType) -> Any:
        """Inverse map a JSON schema type to a Python type.

        Args:
            json_type: The JSON schema type to map.

        Returns:
            The Python type.

        Raises:
            ValueError: If the type is not supported.

        Examples:
            >>> JSONSchema._inv_map_type({"type": "integer"})
            <class 'int'>
            >>> JSONSchema._inv_map_type({"type": "string"})
            <class 'str'>
            >>> JSONSchema._inv_map_type({"type": "number"})
            <class 'float'>
            >>> JSONSchema._inv_map_type({"type": "array", "items": {"type": "integer"}})
            list[int]
            >>> JSONSchema._inv_map_type({"type": "string", "format": "date-time"})
            <class 'datetime.datetime'>
            >>> JSONSchema._inv_map_type({"type": "object"})
            Traceback (most recent call last):
                ...
            ValueError: Unsupported type: {'type': 'object'}
        """

        if json_type["type"] == "array":
            return list[cls._inv_map_type(json_type["items"])]
        elif json_type["type"] == "string" and json_type.get("format") == "date-time":
            return datetime
        elif json_type["type"] in cls.PYTHON_TO_JSON.values():
            return {v: k for k, v in cls.PYTHON_TO_JSON.items()}[json_type["type"]]
        else:
            raise ValueError(f"Unsupported type: {json_type}")

    @classmethod
    def schema(cls) -> dict[str, Any]:
        schema_properties = {}
        required_fields = []

        for c in cls._columns():
            schema_properties[c.name] = c.dtype

            if c.is_required:
                required_fields.append(c.name)

        schema = {
            "type": "object",
            "properties": schema_properties,
            "required": required_fields,
            "additionalProperties": cls.allow_extra_columns,
        }

        return schema

    @classmethod
    def _is_raw_table(cls, arg: Any) -> bool:
        """Check if the argument is a raw table (e.g., of type `RawTable_T`).

        Args:
            arg: The argument to check.

        Returns:
            True if the argument is a table, False otherwise.

        Examples:
            >>> JSONSchema._is_raw_table({"subject_id": 1, "time": "2023-10-01T00:00:00Z", "code": "A"})
            True
            >>> JSONSchema._is_raw_table({"subject_id": 1, "time": datetime(2012, 12, 1), "code": 1})
            True
            >>> JSONSchema._is_raw_table("foobar")
            False
            >>> JSONSchema._is_raw_table({1: 2, 3: 4})
            False
        """

        return not (not isinstance(arg, dict) or not all(isinstance(k, str) for k in arg))

    @classmethod
    def _is_raw_schema(cls, arg: Any) -> bool:
        """Check if the argument is a schema.

        Args:
            arg: The argument to check.

        Returns:
            True if the argument is a schema, False otherwise.

        Examples:
            >>> JSONSchema._is_raw_schema(
            ...     {"type": "object", "properties": {"subject_id": {"type": "integer"}}}
            ... )
            True
            >>> JSONSchema._is_raw_schema({"subject_id": 1})
            False
            >>> JSONSchema._is_raw_schema({"type": "object"})
            False
            >>> JSONSchema._is_raw_schema({"properties": {}})
            False
            >>> JSONSchema._is_raw_schema({"type": "str", "properties": {}})
            False
            >>> JSONSchema._is_raw_schema({"type": "object", "properties": []})
            False
            >>> JSONSchema._is_raw_schema({"type": "object", "properties": {}})
            True
            >>> JSONSchema._is_raw_schema("foobar")
            False
            >>> JSONSchema._is_raw_schema({1: 2, 3: 4})
            False
            >>> JSONSchema._is_raw_schema({"type": "object", "properties": {}, "title": 33})
            False
        """

        if (
            not isinstance(arg, dict)
            or ("type" not in arg)
            or ("properties" not in arg)
            or arg["type"] != "object"
            or not isinstance(arg.get("properties", None), dict)
        ):
            return False

        try:
            Draft202012Validator.check_schema(arg)
            return True
        except SchemaError as e:
            logger.debug(f"JSON query schema is invalid: {e}")
            return False

    @classmethod
    def _raw_schema_cols(cls, schema: JSON_Schema_T) -> list[str]:
        """Get all columns in the schema."""
        return list(schema["properties"].keys())

    @classmethod
    def _raw_schema_col_type(cls, schema: JSON_Schema_T, col: str) -> dict[str, Any]:
        """Get the type of a column in the schema."""
        return schema["properties"][col]

    @classmethod
    def _validate_table(cls, table: JSON_blob_T):
        """Validate the table against the schema."""
        validate(instance=table, schema=cls.schema())

    @classmethod
    def _raw_table_schema(cls, table: dict) -> Any:  # pragma: no cover
        raise NotImplementedError("JSONSchema does not support _raw_table_schema")

    @classmethod
    def _reorder_raw_table(cls, table: JSON_blob_T, table_order: list[str]) -> JSON_blob_T:
        """Reorder the columns of a "table" (JSON blob) to a target list.

        Args:
            table: The JSON blob to reorder.
            table_order: The order to set the columns in.

        Returns:
            The reordered JSON blob.

        Examples:
            >>> JSONSchema._reorder_raw_table({"foo": 1, "bar": 2}, ["bar", "foo"])
            {'bar': 2, 'foo': 1}
        """
        return {k: table[k] for k in table_order}

    @classmethod
    def _cast_raw_table_column(cls, table: JSON_blob_T, col: str, col_type: JSONType) -> JSON_blob_T:
        """Cast a column in the "table" (JSON blob) to the specified type.

        Args:
            table: The JSON blob to cast.
            col: The column to cast.
            col_type: The type to cast the column to.

        Returns:
            The JSON blob with the casted column.

        Examples:
            >>> JSONSchema._cast_raw_table_column({"foo": 1, "bar": 2}, "foo", {"type": "string"})
            {'foo': '1', 'bar': 2}
            >>> JSONSchema._cast_raw_table_column(
            ...     {"foo": 1, "bar": "1234"}, "bar", {"type": "array", "items": {"type": "integer"}}
            ... )
            {'foo': 1, 'bar': [1, 2, 3, 4]}
            >>> JSONSchema._cast_raw_table_column(
            ...     {"foo": "2023-10-01T00:00:00"}, "foo", {"type": "string", "format": "date-time"}
            ... )
            {'foo': datetime.datetime(2023, 10, 1, 0, 0)}
            >>> JSONSchema._cast_raw_table_column(
            ...     {"foo": 1, "bar": "1234"}, "foo", {"type": "array", "items": {"type": "integer"}}
            ... )
            Traceback (most recent call last):
                ...
            ValueError: Column foo can't be casted to {'type': 'array', 'items': {'type': 'integer'}}: 1
        """
        out = {**table}
        try:
            out[col] = cls.__cast_raw_val(table[col], col_type)
        except Exception as e:
            raise ValueError(f"Column {col} can't be casted to {col_type}: {table[col]}") from e
        return out

    @classmethod
    def __cast_raw_val(cls, in_val: Any, col_type: JSONType) -> Any:
        inv_type = cls._inv_map_type(col_type)

        if inv_type is datetime:
            return datetime.fromisoformat(in_val)
        elif col_type["type"] == "array":
            return [cls.__cast_raw_val(v, col_type["items"]) for v in in_val]
        else:
            return inv_type(in_val)

    @classmethod
    def align(cls, table: JSON_blob_T) -> JSON_blob_T:
        raise NotImplementedError("JSONSchema does not support alignment")

    @classmethod
    def _any_null(cls, table: JSON_blob_T, col: str) -> bool:
        """Checks if any value in the table at the given column is None.

        This isn't used in JSON, but we keep them to match the interface.

        Examples:
            >>> class Sample(JSONSchema):
            ...     subject_id: int
            >>> Sample._any_null({"subject_id": 1}, "subject_id")
            False
            >>> Sample._any_null({"subject_id": None}, "subject_id")
            True
            >>> Sample._any_null({}, "subject_id")
            True
        """
        return table.get(col, None) is None

    _all_null = _any_null

_any_null(table, col) classmethod

Checks if any value in the table at the given column is None.

This isn’t used in JSON, but we keep them to match the interface.

Examples:

>>> class Sample(JSONSchema):
...     subject_id: int
>>> Sample._any_null({"subject_id": 1}, "subject_id")
False
>>> Sample._any_null({"subject_id": None}, "subject_id")
True
>>> Sample._any_null({}, "subject_id")
True
Source code in flexible_schema/json.py
@classmethod
def _any_null(cls, table: JSON_blob_T, col: str) -> bool:
    """Checks if any value in the table at the given column is None.

    This isn't used in JSON, but we keep them to match the interface.

    Examples:
        >>> class Sample(JSONSchema):
        ...     subject_id: int
        >>> Sample._any_null({"subject_id": 1}, "subject_id")
        False
        >>> Sample._any_null({"subject_id": None}, "subject_id")
        True
        >>> Sample._any_null({}, "subject_id")
        True
    """
    return table.get(col, None) is None

_cast_raw_table_column(table, col, col_type) classmethod

Cast a column in the “table” (JSON blob) to the specified type.

Parameters:

Name Type Description Default
table JSON_blob_T

The JSON blob to cast.

required
col str

The column to cast.

required
col_type JSONType

The type to cast the column to.

required

Returns:

Type Description
JSON_blob_T

The JSON blob with the casted column.

Examples:

>>> JSONSchema._cast_raw_table_column({"foo": 1, "bar": 2}, "foo", {"type": "string"})
{'foo': '1', 'bar': 2}
>>> JSONSchema._cast_raw_table_column(
...     {"foo": 1, "bar": "1234"}, "bar", {"type": "array", "items": {"type": "integer"}}
... )
{'foo': 1, 'bar': [1, 2, 3, 4]}
>>> JSONSchema._cast_raw_table_column(
...     {"foo": "2023-10-01T00:00:00"}, "foo", {"type": "string", "format": "date-time"}
... )
{'foo': datetime.datetime(2023, 10, 1, 0, 0)}
>>> JSONSchema._cast_raw_table_column(
...     {"foo": 1, "bar": "1234"}, "foo", {"type": "array", "items": {"type": "integer"}}
... )
Traceback (most recent call last):
    ...
ValueError: Column foo can't be casted to {'type': 'array', 'items': {'type': 'integer'}}: 1
Source code in flexible_schema/json.py
@classmethod
def _cast_raw_table_column(cls, table: JSON_blob_T, col: str, col_type: JSONType) -> JSON_blob_T:
    """Cast a column in the "table" (JSON blob) to the specified type.

    Args:
        table: The JSON blob to cast.
        col: The column to cast.
        col_type: The type to cast the column to.

    Returns:
        The JSON blob with the casted column.

    Examples:
        >>> JSONSchema._cast_raw_table_column({"foo": 1, "bar": 2}, "foo", {"type": "string"})
        {'foo': '1', 'bar': 2}
        >>> JSONSchema._cast_raw_table_column(
        ...     {"foo": 1, "bar": "1234"}, "bar", {"type": "array", "items": {"type": "integer"}}
        ... )
        {'foo': 1, 'bar': [1, 2, 3, 4]}
        >>> JSONSchema._cast_raw_table_column(
        ...     {"foo": "2023-10-01T00:00:00"}, "foo", {"type": "string", "format": "date-time"}
        ... )
        {'foo': datetime.datetime(2023, 10, 1, 0, 0)}
        >>> JSONSchema._cast_raw_table_column(
        ...     {"foo": 1, "bar": "1234"}, "foo", {"type": "array", "items": {"type": "integer"}}
        ... )
        Traceback (most recent call last):
            ...
        ValueError: Column foo can't be casted to {'type': 'array', 'items': {'type': 'integer'}}: 1
    """
    out = {**table}
    try:
        out[col] = cls.__cast_raw_val(table[col], col_type)
    except Exception as e:
        raise ValueError(f"Column {col} can't be casted to {col_type}: {table[col]}") from e
    return out

_inv_map_type(json_type) classmethod

Inverse map a JSON schema type to a Python type.

Parameters:

Name Type Description Default
json_type JSONType

The JSON schema type to map.

required

Returns:

Type Description
Any

The Python type.

Raises:

Type Description
ValueError

If the type is not supported.

Examples:

>>> JSONSchema._inv_map_type({"type": "integer"})
<class 'int'>
>>> JSONSchema._inv_map_type({"type": "string"})
<class 'str'>
>>> JSONSchema._inv_map_type({"type": "number"})
<class 'float'>
>>> JSONSchema._inv_map_type({"type": "array", "items": {"type": "integer"}})
list[int]
>>> JSONSchema._inv_map_type({"type": "string", "format": "date-time"})
<class 'datetime.datetime'>
>>> JSONSchema._inv_map_type({"type": "object"})
Traceback (most recent call last):
    ...
ValueError: Unsupported type: {'type': 'object'}
Source code in flexible_schema/json.py
@classmethod
def _inv_map_type(cls, json_type: JSONType) -> Any:
    """Inverse map a JSON schema type to a Python type.

    Args:
        json_type: The JSON schema type to map.

    Returns:
        The Python type.

    Raises:
        ValueError: If the type is not supported.

    Examples:
        >>> JSONSchema._inv_map_type({"type": "integer"})
        <class 'int'>
        >>> JSONSchema._inv_map_type({"type": "string"})
        <class 'str'>
        >>> JSONSchema._inv_map_type({"type": "number"})
        <class 'float'>
        >>> JSONSchema._inv_map_type({"type": "array", "items": {"type": "integer"}})
        list[int]
        >>> JSONSchema._inv_map_type({"type": "string", "format": "date-time"})
        <class 'datetime.datetime'>
        >>> JSONSchema._inv_map_type({"type": "object"})
        Traceback (most recent call last):
            ...
        ValueError: Unsupported type: {'type': 'object'}
    """

    if json_type["type"] == "array":
        return list[cls._inv_map_type(json_type["items"])]
    elif json_type["type"] == "string" and json_type.get("format") == "date-time":
        return datetime
    elif json_type["type"] in cls.PYTHON_TO_JSON.values():
        return {v: k for k, v in cls.PYTHON_TO_JSON.items()}[json_type["type"]]
    else:
        raise ValueError(f"Unsupported type: {json_type}")

_is_raw_schema(arg) classmethod

Check if the argument is a schema.

Parameters:

Name Type Description Default
arg Any

The argument to check.

required

Returns:

Type Description
bool

True if the argument is a schema, False otherwise.

Examples:

>>> JSONSchema._is_raw_schema(
...     {"type": "object", "properties": {"subject_id": {"type": "integer"}}}
... )
True
>>> JSONSchema._is_raw_schema({"subject_id": 1})
False
>>> JSONSchema._is_raw_schema({"type": "object"})
False
>>> JSONSchema._is_raw_schema({"properties": {}})
False
>>> JSONSchema._is_raw_schema({"type": "str", "properties": {}})
False
>>> JSONSchema._is_raw_schema({"type": "object", "properties": []})
False
>>> JSONSchema._is_raw_schema({"type": "object", "properties": {}})
True
>>> JSONSchema._is_raw_schema("foobar")
False
>>> JSONSchema._is_raw_schema({1: 2, 3: 4})
False
>>> JSONSchema._is_raw_schema({"type": "object", "properties": {}, "title": 33})
False
Source code in flexible_schema/json.py
@classmethod
def _is_raw_schema(cls, arg: Any) -> bool:
    """Check if the argument is a schema.

    Args:
        arg: The argument to check.

    Returns:
        True if the argument is a schema, False otherwise.

    Examples:
        >>> JSONSchema._is_raw_schema(
        ...     {"type": "object", "properties": {"subject_id": {"type": "integer"}}}
        ... )
        True
        >>> JSONSchema._is_raw_schema({"subject_id": 1})
        False
        >>> JSONSchema._is_raw_schema({"type": "object"})
        False
        >>> JSONSchema._is_raw_schema({"properties": {}})
        False
        >>> JSONSchema._is_raw_schema({"type": "str", "properties": {}})
        False
        >>> JSONSchema._is_raw_schema({"type": "object", "properties": []})
        False
        >>> JSONSchema._is_raw_schema({"type": "object", "properties": {}})
        True
        >>> JSONSchema._is_raw_schema("foobar")
        False
        >>> JSONSchema._is_raw_schema({1: 2, 3: 4})
        False
        >>> JSONSchema._is_raw_schema({"type": "object", "properties": {}, "title": 33})
        False
    """

    if (
        not isinstance(arg, dict)
        or ("type" not in arg)
        or ("properties" not in arg)
        or arg["type"] != "object"
        or not isinstance(arg.get("properties", None), dict)
    ):
        return False

    try:
        Draft202012Validator.check_schema(arg)
        return True
    except SchemaError as e:
        logger.debug(f"JSON query schema is invalid: {e}")
        return False

_is_raw_table(arg) classmethod

Check if the argument is a raw table (e.g., of type RawTable_T).

Parameters:

Name Type Description Default
arg Any

The argument to check.

required

Returns:

Type Description
bool

True if the argument is a table, False otherwise.

Examples:

>>> JSONSchema._is_raw_table({"subject_id": 1, "time": "2023-10-01T00:00:00Z", "code": "A"})
True
>>> JSONSchema._is_raw_table({"subject_id": 1, "time": datetime(2012, 12, 1), "code": 1})
True
>>> JSONSchema._is_raw_table("foobar")
False
>>> JSONSchema._is_raw_table({1: 2, 3: 4})
False
Source code in flexible_schema/json.py
@classmethod
def _is_raw_table(cls, arg: Any) -> bool:
    """Check if the argument is a raw table (e.g., of type `RawTable_T`).

    Args:
        arg: The argument to check.

    Returns:
        True if the argument is a table, False otherwise.

    Examples:
        >>> JSONSchema._is_raw_table({"subject_id": 1, "time": "2023-10-01T00:00:00Z", "code": "A"})
        True
        >>> JSONSchema._is_raw_table({"subject_id": 1, "time": datetime(2012, 12, 1), "code": 1})
        True
        >>> JSONSchema._is_raw_table("foobar")
        False
        >>> JSONSchema._is_raw_table({1: 2, 3: 4})
        False
    """

    return not (not isinstance(arg, dict) or not all(isinstance(k, str) for k in arg))

_raw_schema_col_type(schema, col) classmethod

Get the type of a column in the schema.

Source code in flexible_schema/json.py
@classmethod
def _raw_schema_col_type(cls, schema: JSON_Schema_T, col: str) -> dict[str, Any]:
    """Get the type of a column in the schema."""
    return schema["properties"][col]

_raw_schema_cols(schema) classmethod

Get all columns in the schema.

Source code in flexible_schema/json.py
@classmethod
def _raw_schema_cols(cls, schema: JSON_Schema_T) -> list[str]:
    """Get all columns in the schema."""
    return list(schema["properties"].keys())

_reorder_raw_table(table, table_order) classmethod

Reorder the columns of a “table” (JSON blob) to a target list.

Parameters:

Name Type Description Default
table JSON_blob_T

The JSON blob to reorder.

required
table_order list[str]

The order to set the columns in.

required

Returns:

Type Description
JSON_blob_T

The reordered JSON blob.

Examples:

>>> JSONSchema._reorder_raw_table({"foo": 1, "bar": 2}, ["bar", "foo"])
{'bar': 2, 'foo': 1}
Source code in flexible_schema/json.py
@classmethod
def _reorder_raw_table(cls, table: JSON_blob_T, table_order: list[str]) -> JSON_blob_T:
    """Reorder the columns of a "table" (JSON blob) to a target list.

    Args:
        table: The JSON blob to reorder.
        table_order: The order to set the columns in.

    Returns:
        The reordered JSON blob.

    Examples:
        >>> JSONSchema._reorder_raw_table({"foo": 1, "bar": 2}, ["bar", "foo"])
        {'bar': 2, 'foo': 1}
    """
    return {k: table[k] for k in table_order}

_validate_table(table) classmethod

Validate the table against the schema.

Source code in flexible_schema/json.py
@classmethod
def _validate_table(cls, table: JSON_blob_T):
    """Validate the table against the schema."""
    validate(instance=table, schema=cls.schema())

map_type(field_type) classmethod

Map a Python type to a JSON schema type.

Parameters:

Name Type Description Default
field_type Any

The Python type to map.

required

Returns:

Type Description
JSONType

The JSON schema type, in string form.

Raises:

Type Description
ValueError

If the type is not supported.

Examples:

>>> JSONSchema.map_type(int)
{'type': 'integer'}
>>> JSONSchema.map_type(list[float])
{'type': 'array', 'items': {'type': 'number'}}
>>> JSONSchema.map_type(str)
{'type': 'string'}
>>> JSONSchema.map_type(list[datetime])
{'type': 'array', 'items': {'type': 'string', 'format': 'date-time'}}
>>> JSONSchema.map_type("integer")
{'type': 'integer'}
>>> JSONSchema.map_type((int, str))
Traceback (most recent call last):
    ...
ValueError: Unsupported type: (<class 'int'>, <class 'str'>)
Source code in flexible_schema/json.py
@classmethod
def map_type(cls, field_type: Any) -> JSONType:
    """Map a Python type to a JSON schema type.

    Args:
        field_type: The Python type to map.

    Returns:
        The JSON schema type, in string form.

    Raises:
        ValueError: If the type is not supported.

    Examples:
        >>> JSONSchema.map_type(int)
        {'type': 'integer'}
        >>> JSONSchema.map_type(list[float])
        {'type': 'array', 'items': {'type': 'number'}}
        >>> JSONSchema.map_type(str)
        {'type': 'string'}
        >>> JSONSchema.map_type(list[datetime])
        {'type': 'array', 'items': {'type': 'string', 'format': 'date-time'}}
        >>> JSONSchema.map_type("integer")
        {'type': 'integer'}
        >>> JSONSchema.map_type((int, str))
        Traceback (most recent call last):
            ...
        ValueError: Unsupported type: (<class 'int'>, <class 'str'>)
    """

    origin = get_origin(field_type)

    if origin is list:
        args = get_args(field_type)
        return {"type": "array", "items": cls.map_type(args[0])}
    elif field_type is datetime or origin is datetime:
        return {"type": "string", "format": "date-time"}
    elif field_type in cls.PYTHON_TO_JSON:
        return {"type": cls.PYTHON_TO_JSON[field_type]}
    elif isinstance(field_type, str):
        return {"type": field_type}
    else:
        raise ValueError(f"Unsupported type: {field_type}")

JSONType

Bases: TypedDict

A JSON schema type definition.

This is used to define the type of a column in the JSON schema.

Source code in flexible_schema/json.py
class JSONType(TypedDict, total=False):
    """A JSON schema type definition.

    This is used to define the type of a column in the JSON schema.
    """

    type: str
    format: str | None = None
    items: J | None = None