Skip to content

Query Plan Nodes

Internal plan node classes that form the query execution tree. Each node implements the volcano / iterator model — data flows upward through the tree as parent nodes pull rows from their children.

These classes are not part of the public API but are documented here for users who want to understand or extend the query engine internals.


PlanNode

PlanNode

Abstract base class for all query plan nodes.

Every node in the query plan tree inherits from PlanNode and implements the volcano / iterator execution model. Data flows upward through the tree: a parent node pulls rows from its children by calling execute() or execute_batched().

Methods:

  • schema

    Return the output schema of this node.

  • execute

    Yield rows one at a time by flattening batched output.

  • execute_batched

    Yield rows in batches of up to _BATCH_SIZE.

  • fast_count

    Return the exact row count without executing, if known.

  • children

    Return the direct child nodes of this plan node.

  • explain

    Return a human-readable representation of the plan tree.

schema

schema() -> LazySchema

Return the output schema of this node.

Returns:

  • LazySchema

    The schema describing columns and types produced by this node.

execute

execute() -> Iterator[tuple]

Yield rows one at a time by flattening batched output.

Returns:

  • Iterator[tuple]

    An iterator of tuples, each representing one row.

execute_batched

execute_batched() -> Iterator[list[tuple]]

Yield rows in batches of up to _BATCH_SIZE.

Returns:

  • Iterator[list[tuple]]

    An iterator of lists, where each list contains up to 1024 row tuples.

fast_count

fast_count() -> int | None

Return the exact row count without executing, if known.

Returns:

  • int | None

    The row count, or None if it cannot be determined cheaply.

children

children() -> list[PlanNode]

Return the direct child nodes of this plan node.

Returns:

  • list[PlanNode]

    A list of child PlanNode instances (empty for leaf nodes).

explain

explain(indent: int = 0) -> str

Return a human-readable representation of the plan tree.

Parameters:

  • indent (int, default: 0 ) –

    Current indentation level (used for recursive calls).

Returns:

  • str

    A multi-line string showing this node and all descendants.


Data Source Nodes

ScanNode

ScanNode

Bases: PlanNode

Leaf node that reads from an in-memory list of row tuples.

This is the most common data source node, created when a LazyFrame is constructed from Python data. It supports fast_count because the full dataset is already materialised.

Parameters:

  • data (list) –

    List of row tuples.

  • columns (list[str]) –

    Column names corresponding to tuple positions.

  • lazy_schema (LazySchema | None, default: None ) –

    Optional pre-computed schema; inferred from data if omitted.


IteratorSourceNode

IteratorSourceNode

Bases: PlanNode

Leaf node that reads from a lazily-evaluated iterator factory.

Each call to execute_batched invokes the factory to produce a fresh iterator, enabling repeatable reads from streaming sources such as file readers.

Parameters:

  • columns (list[str]) –

    Column names for the produced rows.

  • lazy_schema (LazySchema) –

    Schema describing the output columns.

  • iterator_factory (Callable[[], Iterator[tuple]]) –

    A zero-argument callable that returns an iterator of row tuples.

  • source_label (str, default: 'Iterator' ) –

    Descriptive label shown in explain output.


Transform Nodes

ProjectNode

ProjectNode

Bases: PlanNode

Selects or reorders columns, or evaluates computed expressions.

When columns is provided, only those columns are kept (a SQL SELECT). When exprs is provided, each expression is evaluated to produce a new set of output columns.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • columns (list[str] | None, default: None ) –

    Column names to select. Mutually exclusive with exprs.

  • exprs (list[Expr] | None, default: None ) –

    Expressions to evaluate. Mutually exclusive with columns.


FilterNode

FilterNode

Bases: PlanNode

Filters rows by evaluating a boolean predicate expression.

Only rows for which predicate evaluates to a truthy value are passed through. The output schema is unchanged.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • predicate (Expr) –

    Boolean expression used to filter rows.


WithColumnNode

WithColumnNode

Bases: PlanNode

Appends a new computed column to the output.

Evaluates expr for every row and adds the result as a new column named name. The output schema is the parent schema plus the new column.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • name (str) –

    Name for the new column.

  • expr (Expr) –

    Expression to evaluate for each row.


ApplyNode

ApplyNode

Bases: PlanNode

Applies a scalar function to one or more columns.

When columns is provided, only those columns are transformed; otherwise func is applied to every value in every column.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • func (Callable[..., Any]) –

    Callable applied element-wise to each value.

  • columns (list[str] | None, default: None ) –

    Columns to transform. If None, all columns are transformed.

  • output_dtype (type | None, default: None ) –

    Optional output type to record in the schema.


RenameNode

RenameNode

Bases: PlanNode

Renames columns without modifying data.

A zero-cost metadata-only operation — the underlying rows are passed through unchanged, only the schema is updated.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • mapping (dict[str, str]) –

    Dictionary mapping old column names to new names.


SortNode

SortNode

Bases: PlanNode

Sorts all rows by one or more columns using Timsort.

Materialises the full input before sorting. Multi-column sorts are composed via sequential stable sorts in reverse priority order.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • by (list[str]) –

    Column names to sort by.

  • ascending (list[bool] | None, default: None ) –

    Sort direction per column. Defaults to ascending for all.


Join Nodes

JoinNode

JoinNode

Bases: PlanNode

Hash-based join of two input plan nodes.

Materialises the right side into a hash table keyed on the join columns, then streams the left side and probes the table. Supports inner, left, and full join modes.

Parameters:

  • left (PlanNode) –

    Left input plan node.

  • right (PlanNode) –

    Right input plan node.

  • left_on (list[str]) –

    Join-key column names from the left side.

  • right_on (list[str]) –

    Join-key column names from the right side.

  • how (JoinHow, default: 'inner' ) –

    Join type — 'inner', 'left', or 'full'.


SortedMergeJoinNode

SortedMergeJoinNode

Bases: PlanNode

Sort-merge join for pre-sorted inputs.

Both inputs must be sorted on their respective join keys. Two cursors advance in lockstep, emitting matches when keys are equal. Memory usage is O(1) for one-to-one joins and O(g) for groups with many-to-many matches.

Parameters:

  • left (PlanNode) –

    Left input plan node (sorted by left_on).

  • right (PlanNode) –

    Right input plan node (sorted by right_on).

  • left_on (list[str]) –

    Join-key column names from the left side.

  • right_on (list[str]) –

    Join-key column names from the right side.

  • how (JoinHow, default: 'inner' ) –

    Join type — 'inner', 'left', or 'full'.


Aggregation Nodes

AggNode

AggNode

Bases: PlanNode

Hash-based group-by aggregation node.

Groups rows by group_by columns using a hash map and maintains running accumulators for each aggregation expression. Memory usage is O(k) where k is the number of distinct groups.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • group_by (list[str]) –

    Column names to group by.

  • agg_exprs (list[AggExpr]) –

    Aggregation expressions to evaluate per group.


SortedAggNode

SortedAggNode

Bases: PlanNode

Streaming group-by aggregation for pre-sorted input.

Assumes the child yields rows already sorted by the group_by columns. Groups are detected by key changes, so only one accumulator set is active at a time — O(1) memory per group.

Parameters:

  • child (PlanNode) –

    Input plan node (must be pre-sorted by group_by).

  • group_by (list[str]) –

    Column names to group by.

  • agg_exprs (list[AggExpr]) –

    Aggregation expressions to evaluate per group.


Reshape Nodes

ExplodeNode

ExplodeNode

Bases: PlanNode

Unnests a list-valued column into separate rows.

For each row, the list in column is expanded so that every element produces its own row. None values pass through unchanged.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • column (str) –

    Name of the list-valued column to explode.


UnpivotNode

UnpivotNode

Bases: PlanNode

Converts columns into rows (melt / unpivot).

Keeps id_columns fixed and transforms each value_columns entry into a (variable, value) row pair, similar to pandas melt or SQL UNPIVOT.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • id_columns (list[str]) –

    Columns to keep as identifiers.

  • value_columns (list[str]) –

    Columns to unpivot into rows.

  • variable_name (str, default: 'variable' ) –

    Name for the new column holding the original column names.

  • value_name (str, default: 'value' ) –

    Name for the new column holding the values.


PivotNode

PivotNode

Bases: PlanNode

Converts rows into columns (pivot / spread).

Groups rows by index columns and spreads the distinct values in on into new columns, aggregating values with agg_name. Pivot columns are auto-discovered from the data unless columns is explicitly provided.

Parameters:

  • child (PlanNode) –

    Input plan node.

  • index (list[str]) –

    Columns to keep as the row index.

  • on (str) –

    Column whose distinct values become new column headers.

  • values (str) –

    Column containing the values to fill the pivoted cells.

  • agg_name (AggFunc, default: 'first' ) –

    Aggregation function applied when multiple values map to the same cell.

  • columns (list[str] | None, default: None ) –

    Explicit list of pivot columns. Auto-discovered if None.


Combine Nodes

UnionNode

UnionNode

Bases: PlanNode

Concatenates rows from multiple input nodes (SQL UNION ALL).

The output schema is taken from the first child. All children must produce rows with compatible schemas.

Parameters:

  • children_nodes (list[PlanNode]) –

    List of input plan nodes to concatenate.


Window Nodes

WindowNode

WindowNode

Bases: PlanNode

Evaluates a window function and appends the result as a new column.

Implements the sort-partition-scan pattern: materialises all rows, partitions by key columns, sorts within each partition, then computes the window expression (rank, cumulative, offset, or aggregate).

Parameters:

  • child (PlanNode) –

    Input plan node.

  • window_expr (WindowExpr) –

    The window expression to evaluate.

  • output_name (str) –

    Name for the new column containing window results.


Optimizer

Optimizer

Rule-based query optimizer.

Applies two optimization passes to a query plan:

  1. Filter pushdown — moves filter nodes closer to data sources, reducing the number of rows processed by downstream operations.
  2. Column pruning — removes unused columns early in the plan, reducing memory usage.

Methods:

  • optimize

    Apply all optimization passes to a plan tree.

optimize

optimize(plan: PlanNode) -> PlanNode

Apply all optimization passes to a plan tree.

Parameters:

  • plan (PlanNode) –

    Root node of the query plan to optimize.

Returns:

  • PlanNode

    A new, semantically equivalent plan tree with optimizations applied.