Schema¶

Schema classes that track column names, types, and nullability throughout the query plan — without touching data. For the query plan nodes and optimizer, see Query Plan & Optimizer.

LazySchema¶

LazySchema ¶

Schema for a Floe or plan node, mapping column names to their types.

A LazySchema propagates through the query plan without touching data, so you can inspect the output schema of any pipeline instantly.

Examples:

>>> schema = LazySchema.from_dicts([{"name": "Alice", "age": 30}])
>>> schema.column_names
['name', 'age']
>>> schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

Methods:

__init__ –

Initialize a LazySchema.
select –

Return a new schema with only the specified columns.
drop –

Return a new schema without the specified columns.
rename –

Return a new schema with columns renamed according to the mapping.
merge –

Merge two schemas, prefixing duplicate column names.
with_column –

Return a new schema with an added or replaced column.
with_dtype –

Return a new schema with the type of one column changed.
from_data –

Infer a schema from column names and a sample of tuple rows.
from_dicts –

Infer a schema from a sample of dict rows.

Attributes:

column_names (list[str]) –

List of column names in order.
dtypes (dict[str, type]) –

Mapping of column names to their Python types.

column_names `property` ¶

column_names: list[str]

List of column names in order.

dtypes `property` ¶

dtypes: dict[str, type]

Mapping of column names to their Python types.

init ¶

__init__(columns: dict[str, ColumnSchema] | None = None) -> None

Initialize a LazySchema.

Parameters:

columns (dict[str, ColumnSchema] | None, default: None ) –

Mapping of column names to ColumnSchema objects. If None, creates an empty schema.

select ¶

select(columns: list[str]) -> LazySchema

Return a new schema with only the specified columns.

Parameters:

columns (list[str]) –

Column names to keep.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int), "b": ColumnSchema("b", str)})
>>> schema.select(["a"]).column_names
['a']

drop ¶

drop(columns: list[str]) -> LazySchema

Return a new schema without the specified columns.

Parameters:

columns (list[str]) –

Column names to remove.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int), "b": ColumnSchema("b", str)})
>>> schema.drop(["b"]).column_names
['a']

rename ¶

rename(mapping: dict[str, str]) -> LazySchema

Return a new schema with columns renamed according to the mapping.

Parameters:

mapping (dict[str, str]) –

Old name to new name mapping.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int)})
>>> schema.rename({"a": "x"}).column_names
['x']

merge ¶

merge(other: LazySchema, suffix: str = 'right_') -> LazySchema

Merge two schemas, prefixing duplicate column names.

Parameters:

other (LazySchema) –

Schema to merge in.
suffix (str, default: 'right_' ) –

Prefix added to duplicate column names from the other schema.

Examples:

>>> s1 = LazySchema({"id": ColumnSchema("id", int), "a": ColumnSchema("a", str)})
>>> s2 = LazySchema({"id": ColumnSchema("id", int), "b": ColumnSchema("b", str)})
>>> s1.merge(s2).column_names
['id', 'a', 'right_id', 'b']

with_column ¶

with_column(name: str, dtype: type, nullable: bool = True) -> LazySchema

Return a new schema with an added or replaced column.

Parameters:

name (str) –

Column name.
dtype (type) –

Python type for the column.
nullable (bool, default: True ) –

Whether the column may contain None.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int)})
>>> schema.with_column("b", str).column_names
['a', 'b']

with_dtype ¶

with_dtype(column: str, dtype: type) -> LazySchema

Return a new schema with the type of one column changed.

Parameters:

column (str) –

Column name to change.
dtype (type) –

New Python type.

from_data `classmethod` ¶

from_data(columns: list[str], rows: list[tuple]) -> LazySchema

Infer a schema from column names and a sample of tuple rows.

Parameters:

columns (list[str]) –

Column names.
rows (list[tuple]) –

Sample rows as tuples (up to 1000 are inspected).

Examples:

>>> schema = LazySchema.from_data(["x", "y"], [(1, "hello"), (2, None)])
>>> schema.dtypes
{'x': <class 'int'>, 'y': <class 'str'>}
>>> schema["y"].nullable
True

from_dicts `classmethod` ¶

from_dicts(data: list[dict]) -> LazySchema

Infer a schema from a sample of dict rows.

Parameters:

data (list[dict]) –

Sample rows as dicts (up to 1000 are inspected).

Examples:

>>> schema = LazySchema.from_dicts([{"name": "Alice", "age": 30}])
>>> schema.column_names
['name', 'age']
>>> schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

ColumnSchema¶

ColumnSchema `dataclass` ¶

Schema definition for a single column.

Stores the column's name, Python type, and nullability. Immutable (frozen dataclass) so it can be used as a dict key.

Attributes:

name (str) –

Column name.
dtype (type) –

Python type of the column values.
nullable (bool) –

Whether the column may contain None values.

Examples:

>>> cs = ColumnSchema("age", int, nullable=False)
>>> cs.name
'age'
>>> cs.dtype
<class 'int'>

Methods:

with_name –

Return a copy with a different column name.
with_dtype –

Return a copy with a different data type.
with_nullable –

Return a copy with a different nullability flag.

with_name ¶

with_name(name: str) -> ColumnSchema

Return a copy with a different column name.

with_dtype ¶

with_dtype(dtype: type) -> ColumnSchema

Return a copy with a different data type.

with_nullable ¶

with_nullable(nullable: bool) -> ColumnSchema

Return a copy with a different nullability flag.

Schema¶

LazySchema¶

LazySchema ¶

column_names property ¶

dtypes property ¶

__init__ ¶

select ¶

drop ¶

rename ¶

merge ¶

with_column ¶

with_dtype ¶

from_data classmethod ¶

from_dicts classmethod ¶

ColumnSchema¶

ColumnSchema dataclass ¶

with_name ¶

with_dtype ¶

with_nullable ¶

column_names `property` ¶

dtypes `property` ¶

init ¶

from_data `classmethod` ¶

from_dicts `classmethod` ¶

ColumnSchema `dataclass` ¶