Skip to content

Schema

Schema classes that track column names, types, and nullability throughout the query plan — without touching data. For the query plan nodes and optimizer, see Query Plan & Optimizer.


LazySchema

LazySchema

Schema for a Floe or plan node, mapping column names to their types.

A LazySchema propagates through the query plan without touching data, so you can inspect the output schema of any pipeline instantly.

Examples:

>>> schema = LazySchema.from_dicts([{"name": "Alice", "age": 30}])
>>> schema.column_names
['name', 'age']
>>> schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

Methods:

  • __init__

    Initialize a LazySchema.

  • select

    Return a new schema with only the specified columns.

  • drop

    Return a new schema without the specified columns.

  • rename

    Return a new schema with columns renamed according to the mapping.

  • merge

    Merge two schemas, prefixing duplicate column names.

  • with_column

    Return a new schema with an added or replaced column.

  • with_dtype

    Return a new schema with the type of one column changed.

  • from_data

    Infer a schema from column names and a sample of tuple rows.

  • from_dicts

    Infer a schema from a sample of dict rows.

Attributes:

  • column_names (list[str]) –

    List of column names in order.

  • dtypes (dict[str, type]) –

    Mapping of column names to their Python types.

column_names property

column_names: list[str]

List of column names in order.

dtypes property

dtypes: dict[str, type]

Mapping of column names to their Python types.

__init__

__init__(columns: dict[str, ColumnSchema] | None = None) -> None

Initialize a LazySchema.

Parameters:

  • columns (dict[str, ColumnSchema] | None, default: None ) –

    Mapping of column names to ColumnSchema objects. If None, creates an empty schema.

select

select(columns: list[str]) -> LazySchema

Return a new schema with only the specified columns.

Parameters:

  • columns (list[str]) –

    Column names to keep.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int), "b": ColumnSchema("b", str)})
>>> schema.select(["a"]).column_names
['a']

drop

drop(columns: list[str]) -> LazySchema

Return a new schema without the specified columns.

Parameters:

  • columns (list[str]) –

    Column names to remove.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int), "b": ColumnSchema("b", str)})
>>> schema.drop(["b"]).column_names
['a']

rename

rename(mapping: dict[str, str]) -> LazySchema

Return a new schema with columns renamed according to the mapping.

Parameters:

  • mapping (dict[str, str]) –

    Old name to new name mapping.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int)})
>>> schema.rename({"a": "x"}).column_names
['x']

merge

merge(other: LazySchema, suffix: str = 'right_') -> LazySchema

Merge two schemas, prefixing duplicate column names.

Parameters:

  • other (LazySchema) –

    Schema to merge in.

  • suffix (str, default: 'right_' ) –

    Prefix added to duplicate column names from the other schema.

Examples:

>>> s1 = LazySchema({"id": ColumnSchema("id", int), "a": ColumnSchema("a", str)})
>>> s2 = LazySchema({"id": ColumnSchema("id", int), "b": ColumnSchema("b", str)})
>>> s1.merge(s2).column_names
['id', 'a', 'right_id', 'b']

with_column

with_column(name: str, dtype: type, nullable: bool = True) -> LazySchema

Return a new schema with an added or replaced column.

Parameters:

  • name (str) –

    Column name.

  • dtype (type) –

    Python type for the column.

  • nullable (bool, default: True ) –

    Whether the column may contain None.

Examples:

>>> schema = LazySchema({"a": ColumnSchema("a", int)})
>>> schema.with_column("b", str).column_names
['a', 'b']

with_dtype

with_dtype(column: str, dtype: type) -> LazySchema

Return a new schema with the type of one column changed.

Parameters:

  • column (str) –

    Column name to change.

  • dtype (type) –

    New Python type.

from_data classmethod

from_data(columns: list[str], rows: list[tuple]) -> LazySchema

Infer a schema from column names and a sample of tuple rows.

Parameters:

  • columns (list[str]) –

    Column names.

  • rows (list[tuple]) –

    Sample rows as tuples (up to 1000 are inspected).

Examples:

>>> schema = LazySchema.from_data(["x", "y"], [(1, "hello"), (2, None)])
>>> schema.dtypes
{'x': <class 'int'>, 'y': <class 'str'>}
>>> schema["y"].nullable
True

from_dicts classmethod

from_dicts(data: list[dict]) -> LazySchema

Infer a schema from a sample of dict rows.

Parameters:

  • data (list[dict]) –

    Sample rows as dicts (up to 1000 are inspected).

Examples:

>>> schema = LazySchema.from_dicts([{"name": "Alice", "age": 30}])
>>> schema.column_names
['name', 'age']
>>> schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

ColumnSchema

ColumnSchema dataclass

Schema definition for a single column.

Stores the column's name, Python type, and nullability. Immutable (frozen dataclass) so it can be used as a dict key.

Attributes:

  • name (str) –

    Column name.

  • dtype (type) –

    Python type of the column values.

  • nullable (bool) –

    Whether the column may contain None values.

Examples:

>>> cs = ColumnSchema("age", int, nullable=False)
>>> cs.name
'age'
>>> cs.dtype
<class 'int'>

Methods:

  • with_name

    Return a copy with a different column name.

  • with_dtype

    Return a copy with a different data type.

  • with_nullable

    Return a copy with a different nullability flag.

with_name

with_name(name: str) -> ColumnSchema

Return a copy with a different column name.

with_dtype

with_dtype(dtype: type) -> ColumnSchema

Return a copy with a different data type.

with_nullable

with_nullable(nullable: bool) -> ColumnSchema

Return a copy with a different nullability flag.