Schema¶
Schema classes that track column names, types, and nullability throughout the query plan — without touching data. For the query plan nodes and optimizer, see Query Plan & Optimizer.
LazySchema¶
LazySchema
¶
Schema for a Floe or plan node, mapping column names to their types.
A LazySchema propagates through the query plan without touching data, so you can inspect the output schema of any pipeline instantly.
Examples:
>>> schema = LazySchema.from_dicts([{"name": "Alice", "age": 30}])
>>> schema.column_names
['name', 'age']
>>> schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}
Methods:
-
__init__–Initialize a LazySchema.
-
select–Return a new schema with only the specified columns.
-
drop–Return a new schema without the specified columns.
-
rename–Return a new schema with columns renamed according to the mapping.
-
merge–Merge two schemas, prefixing duplicate column names.
-
with_column–Return a new schema with an added or replaced column.
-
with_dtype–Return a new schema with the type of one column changed.
-
from_data–Infer a schema from column names and a sample of tuple rows.
-
from_dicts–Infer a schema from a sample of dict rows.
Attributes:
-
column_names(list[str]) –List of column names in order.
-
dtypes(dict[str, type]) –Mapping of column names to their Python types.
__init__
¶
Initialize a LazySchema.
Parameters:
-
columns(dict[str, ColumnSchema] | None, default:None) –Mapping of column names to ColumnSchema objects. If None, creates an empty schema.
select
¶
drop
¶
rename
¶
merge
¶
Merge two schemas, prefixing duplicate column names.
Parameters:
-
other(LazySchema) –Schema to merge in.
-
suffix(str, default:'right_') –Prefix added to duplicate column names from the other schema.
Examples:
with_column
¶
Return a new schema with an added or replaced column.
Parameters:
-
name(str) –Column name.
-
dtype(type) –Python type for the column.
-
nullable(bool, default:True) –Whether the column may contain None.
Examples:
with_dtype
¶
Return a new schema with the type of one column changed.
Parameters:
-
column(str) –Column name to change.
-
dtype(type) –New Python type.
from_data
classmethod
¶
Infer a schema from column names and a sample of tuple rows.
Parameters:
-
columns(list[str]) –Column names.
-
rows(list[tuple]) –Sample rows as tuples (up to 1000 are inspected).
Examples:
from_dicts
classmethod
¶
Infer a schema from a sample of dict rows.
Parameters:
-
data(list[dict]) –Sample rows as dicts (up to 1000 are inspected).
Examples:
ColumnSchema¶
ColumnSchema
dataclass
¶
Schema definition for a single column.
Stores the column's name, Python type, and nullability. Immutable (frozen dataclass) so it can be used as a dict key.
Attributes:
-
name(str) –Column name.
-
dtype(type) –Python type of the column values.
-
nullable(bool) –Whether the column may contain None values.
Examples:
Methods:
-
with_name–Return a copy with a different column name.
-
with_dtype–Return a copy with a different data type.
-
with_nullable–Return a copy with a different nullability flag.