columns
Column
A simple class to represent a column in the schema.
In general, using Column(...) should be avoided in favor of either Optional or Required, which more
visibly indicate required status.
Attributes:
| Name | Type | Description |
|---|---|---|
dtype |
The data type contained in this column. |
|
default |
ColumnDType | None
|
The default value this column will take on. This will typically be |
nullable |
Nullability
|
What fraction of values in this column can be |
name |
The name of this column in the source table. May not be set in all instances. |
|
is_optional |
bool
|
Whether this column is required or optional. |
Examples:
By default, a column reports its optionality as None, which evaluates to False but allows one to
determine whether optionality was explicitly determined.
>>> print(C.is_optional)
None
You can also set parameters like default, nullability, optionality, and name:
>>> C = Column(str, default="foo", nullable=Nullability.ALL, is_optional=True, name="foo_col")
>>> print(C)
Column(str, name=foo_col, is_optional=True, default=foo, nullable=Nullability.ALL)
Nullability can also be set to True which evaluates to Nullability.ALL and False, which evaluates
to Nullability.NONE:
>>> print(Column(list[int], nullable=True))
Column(list, nullable=Nullability.ALL)
>>> print(Column(dict[str, int], nullable=False))
Column(dict, nullable=Nullability.NONE)
Nullability can also be set to the string equivalents of the enum values:
>>> print(Column(list[int], nullable="some"))
Column(list, nullable=Nullability.SOME)
But if you set it to another type, an error will occur:
>>> Column(int, nullable=32)
Traceback (most recent call last):
...
TypeError: Invalid type for nullable: <class 'int'>, expected bool, str, or Nullability. If using a
string, it must be one of 'none', 'some', or 'all'.
Source code in flexible_schema/columns.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
Nullability
Bases: Enum
A simple str-like enum to represent the nullability of a column.
Upon Python upgrade to 3.11, convert to StrEnum.
Attributes:
| Name | Type | Description |
|---|---|---|
NONE |
No value in the given column can be |
|
SOME |
Some, but not all, values in the given column can be |
|
ALL |
Any value up to and including all values in the given column can be |
Examples:
>>> Nullability.NONE
<Nullability.NONE: 'none'>
>>> Nullability.SOME == "some"
True
>>> Nullability.ALL == "foo"
False
>>> Nullability.NONE == Nullability.NONE
True
>>> Nullability.SOME == Nullability.ALL
False
Source code in flexible_schema/columns.py
Optional
Bases: Column
A class to represent optional columns in a schema.
Examples:
>>> O = Optional(int)
>>> print(O)
Optional(int)
>>> O.dtype
<class 'int'>
>>> O.has_default
False
>>> O.is_optional
True
Default nullability for Optional columns is “ALL”
>>> O.nullable
<Nullability.ALL: 'all'>
You can also define Optional columns with default values and nullability constraints:
>>> O = Optional(int, default=42, nullable=True)
>>> O
Optional(int, default=42, nullable=Nullability.ALL)
>>> O.has_default
True
>>> O.is_optional
True
>>> O.nullable
<Nullability.ALL: 'all'>
>>> O = Optional(list[str], default=["foo"], nullable=False)
>>> O.nullable
<Nullability.NONE: 'none'>
>>> O.has_default
True
>>> O.default
['foo']
Default values are deep-copied to avoid mutable default arguments:
>>> default_list = ["foo"]
>>> O = Optional(list[str], default=default_list)
>>> O.default
['foo']
>>> O.default[0] = "bar"
>>> O.default
['bar']
>>> default_list
['foo']
You can’t try to overwrite is_optional upon or after initialization:
>>> Optional(int, is_optional=False)
Traceback (most recent call last):
...
ValueError: is_optional cannot be set to False for Optional columns
>>> O.is_optional = False
Traceback (most recent call last):
...
ValueError: is_optional cannot be set to False for Optional columns
Source code in flexible_schema/columns.py
Required
Bases: Column
A class to represent required columns in a schema.
Examples:
>>> R = Required(int)
>>> print(R)
Required(int)
>>> R.dtype
<class 'int'>
>>> R.has_default
False
>>> R.is_optional
False
Default nullability for Required columns is “some”
>>> R.nullable
<Nullability.SOME: 'some'>
You can also define Required columns with different nullability constraints:
>>> R = Required(int, nullable=True)
>>> R
Required(int, nullable=Nullability.ALL)
>>> R.is_optional
False
>>> R.nullable
<Nullability.ALL: 'all'>
>>> R = Required(list[str], nullable=False)
>>> R.nullable
<Nullability.NONE: 'none'>
You can’t try to overwrite is_optional upon or after initialization:
>>> Required(int, is_optional=True)
Traceback (most recent call last):
...
ValueError: is_optional cannot be set to True for Required columns
>>> R.is_optional = True
Traceback (most recent call last):
...
ValueError: is_optional cannot be set to True for Required columns
Required columns can’t have default values:
>>> Required(int, default=3)
Traceback (most recent call last):
...
ValueError: Required columns cannot have a default value
Source code in flexible_schema/columns.py
_resolve_annotation(annotation, type_mapper)
Builds a column for a given dataclass field that leverages a type mapping function to resolve types.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
annotation
|
Any
|
The type of the dataclass field that is being converted. |
required |
type_mapper
|
Callable[[ColumnDType], ColumnDType]
|
A function to convert between a base python type (e.g., |
required |
Returns:
| Type | Description |
|---|---|
Column | Optional | Required
|
A column corresponding to the annotation type. |
Examples:
>>> import pyarrow as pa
>>> def type_mapper(T):
... if T is int:
... return pa.int64()
... elif T is str:
... return pa.string()
... else:
... raise TypeError("Can't map types that aren't ints or strs")
>>> _resolve_annotation(int, type_mapper)
Column(DataType(int64))
>>> _resolve_annotation(int | None, type_mapper)
Column(DataType(int64), nullable=Nullability.ALL)
If you pass in a type that causes an error to be raised through remapping, it will fail
>>> _resolve_annotation(list[int], type_mapper)
Traceback (most recent call last):
...
TypeError: Can't map types that aren't ints or strs
Note that if you pass in a Column, the type is still re-mapped.
>>> _resolve_annotation(Column(str, nullable=False), type_mapper)
Column(DataType(string), nullable=Nullability.NONE)
But, if you pass a Column as input, if the base type doesn’t remap, no error will be thrown.
>>> _resolve_annotation(Column(list[str], nullable=False), type_mapper)
Column(list, nullable=Nullability.NONE)
Source code in flexible_schema/columns.py
resolve_dataclass_field(field, type_mapper)
Resolves a dataclass field into a column specification.