Skip to content

LazyFrame

The LazyFrame is the central object in pyfloe. It represents a lazy dataframe — operations build a query plan without executing it. Data flows only when you trigger evaluation with .collect(), .to_pylist(), .to_csv(), or similar methods.


LazyFrame

LazyFrame

A lazy, composable dataframe.

Operations on a LazyFrame build a query plan without executing it. Data flows only when you call a materialization method like .collect(), .to_pylist(), or .to_csv().

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.filter(col("age") > 28).to_pylist()
[{'name': 'Alice', 'age': 30}]

Methods:

  • __init__

    Create a LazyFrame from in-memory data.

  • explain

    Return a string representation of the query plan tree.

  • print_explain

    Print the query plan tree to stdout.

  • select

    Select columns by name or expression.

  • filter

    Filter rows matching a predicate expression.

  • with_column

    Add a computed column to the LazyFrame.

  • with_columns

    Add multiple computed columns at once.

  • drop

    Remove columns from the LazyFrame.

  • rename

    Rename columns.

  • sort

    Sort rows by one or more columns.

  • join

    Join with another LazyFrame.

  • group_by

    Group by one or more columns.

  • explode

    Unnest a list column into separate rows.

  • pivot

    Pivot (reshape long to wide).

  • unpivot

    Unpivot (reshape wide to long). Also available as .melt().

  • union

    Stack rows from another LazyFrame below this one.

  • apply

    Apply a function to column values.

  • read

    Alias for :meth:select. Select columns by name.

  • head

    Return a lazy view of the first n rows.

  • optimize

    Return a new LazyFrame with an optimized query plan.

  • collect

    Materialize the query plan and cache the results.

  • count

    Return the total number of rows.

  • to_pylist

    Materialize and return data as a list of dicts.

  • to_pydict

    Materialize and return data as a dict of column lists.

  • to_tuples

    Materialize and return data as a list of tuples.

  • to_batches

    Materialize and return data in batches of dicts.

  • to_csv

    Stream the query plan to a CSV file with constant memory.

  • to_tsv

    Stream the query plan to a TSV (tab-separated) file.

  • to_jsonl

    Stream the query plan to a JSON Lines file.

  • to_json

    Write data as a JSON array.

  • to_parquet

    Write data to a Parquet file (requires pyarrow).

  • display

    Print a formatted table of the first n rows.

  • typed

    Wrap this LazyFrame as a TypedLazyFrame for IDE-friendly typed results.

  • validate

    Validate the schema against a TypedDict type.

Attributes:

  • schema (LazySchema) –

    Output schema of this LazyFrame, computed without touching data.

  • columns (list[str]) –

    List of column names.

  • dtypes (dict[str, type]) –

    Mapping of column names to their Python types.

  • is_materialized (bool) –

    Whether the query plan has been executed and data is cached.

schema property

schema: LazySchema

Output schema of this LazyFrame, computed without touching data.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.schema.column_names
['name', 'age']
>>> lf.schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

columns property

columns: list[str]

List of column names.

Examples:

>>> LazyFrame([{"x": 1, "y": 2}]).columns
['x', 'y']

dtypes property

dtypes: dict[str, type]

Mapping of column names to their Python types.

Examples:

>>> LazyFrame([{"name": "Alice", "age": 30}]).dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

is_materialized property

is_materialized: bool

Whether the query plan has been executed and data is cached.

Returns True after calling :meth:collect or :meth:to_pylist.

__init__

__init__(raw_data: list[dict] | list | dict | Iterable | None = None, *, name: str | None = None) -> None

Create a LazyFrame from in-memory data.

Parameters:

  • raw_data (list[dict] | list | dict | Iterable | None, default: None ) –

    Input data as a list of dicts, list of tuples, list of objects with __dict__, a dict of columns, or an iterable.

  • name (str | None, default: None ) –

    Optional name for the LazyFrame.

Examples:

From a list of dicts:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.columns
['name', 'age']

From a dict of columns:

>>> lf = LazyFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> lf.to_pylist()
[{'x': 1, 'y': 'a'}, {'x': 2, 'y': 'b'}, {'x': 3, 'y': 'c'}]

From a list of tuples (auto-named columns):

>>> lf = LazyFrame([(1, "a"), (2, "b")])
>>> lf.columns
['col_0', 'col_1']

explain

explain(optimized: bool = False) -> str

Return a string representation of the query plan tree.

Parameters:

  • optimized (bool, default: False ) –

    If True, show the plan after optimization (filter pushdown, column pruning).

Examples:

>>> lf = LazyFrame([{"a": 1}]).filter(col("a") > 0).select("a")
>>> print(lf.explain())
Project [a]
  Filter [(col("a") > 0)]
    Scan [a] (1 rows)

print_explain

print_explain(optimized: bool = False) -> None

Print the query plan tree to stdout.

Shortcut for print(lf.explain()).

Parameters:

  • optimized (bool, default: False ) –

    If True, show the plan after optimization.

select

select(*args: str | Expr) -> LazyFrame

Select columns by name or expression.

Parameters:

  • *args (str | Expr, default: () ) –

    Column names as strings, or Expr objects.

Returns:

  • LazyFrame

    A new LazyFrame with only the selected columns.

Examples:

Select by column name:

>>> lf = LazyFrame([{"a": 1, "b": 2, "c": 3}])
>>> lf.select("a", "c").to_pylist()
[{'a': 1, 'c': 3}]

Select with expressions:

>>> lf.select(col("a"), (col("b") + col("c")).alias("sum")).to_pylist()
[{'a': 1, 'sum': 5}]

filter

filter(predicate_or_col=None, _filter=None, **kwargs) -> LazyFrame

Filter rows matching a predicate expression.

Parameters:

  • predicate_or_col

    An Expr that evaluates to a boolean per row, e.g. col("amount") > 100.

Returns:

  • LazyFrame

    A new LazyFrame with only matching rows.

Examples:

>>> orders = LazyFrame([
...     {"product": "A", "amount": 250, "region": "EU"},
...     {"product": "B", "amount": 75,  "region": "US"},
...     {"product": "C", "amount": 180, "region": "EU"},
... ])
>>> orders.filter(col("amount") > 100).to_pylist()
[{'product': 'A', 'amount': 250, 'region': 'EU'}, {'product': 'C', 'amount': 180, 'region': 'EU'}]

Compound filters with & and |:

>>> orders.filter((col("region") == "EU") & (col("amount") > 200)).to_pylist()
[{'product': 'A', 'amount': 250, 'region': 'EU'}]

with_column

with_column(name_or_expr: str | Expr, expr: Expr | None = None) -> LazyFrame

Add a computed column to the LazyFrame.

Can be called with a name and expression, or with a single expression whose output name is derived via .alias() or from the underlying column reference.

Parameters:

  • name_or_expr (str | Expr) –

    Column name (str) or an expression with an inferrable output name.

  • expr (Expr | None, default: None ) –

    Expression to compute the column values (required when name_or_expr is a string).

Returns:

  • LazyFrame

    A new LazyFrame with the additional column.

Examples:

>>> lf = LazyFrame([{"price": 100}, {"price": 200}])
>>> lf.with_column("tax", col("price") * 0.2).to_pylist()
[{'price': 100, 'tax': 20.0}, {'price': 200, 'tax': 40.0}]
>>> lf.with_column((col("price") * 0.2).alias("tax")).to_pylist()
[{'price': 100, 'tax': 20.0}, {'price': 200, 'tax': 40.0}]

with_columns

with_columns(*args: Expr, **kwargs: Expr) -> LazyFrame

Add multiple computed columns at once.

Accepts positional expressions (with names derived from .alias() or the underlying column) and/or keyword arguments.

Parameters:

  • *args (Expr, default: () ) –

    Expressions with inferrable output names.

  • **kwargs (Expr, default: {} ) –

    Column name to expression mappings.

Returns:

  • LazyFrame

    A new LazyFrame with the additional columns.

Examples:

>>> lf = LazyFrame([{"amount": 250, "region": "eu"}])
>>> lf.with_columns(
...     (col("amount") * 2).alias("double"),
...     col("region").str.upper().alias("upper_region"),
... ).to_pylist()
[{'amount': 250, 'region': 'eu', 'double': 500, 'upper_region': 'EU'}]
>>> lf.with_columns(
...     double=col("amount") * 2,
...     upper_region=col("region").str.upper(),
... ).to_pylist()
[{'amount': 250, 'region': 'eu', 'double': 500, 'upper_region': 'EU'}]

drop

drop(*columns: str) -> LazyFrame

Remove columns from the LazyFrame.

Parameters:

  • *columns (str, default: () ) –

    Column names to drop.

Returns:

  • LazyFrame

    A new LazyFrame without the specified columns.

Examples:

>>> lf = LazyFrame([{"a": 1, "b": 2, "c": 3}])
>>> lf.drop("b", "c").columns
['a']

rename

rename(mapping: dict[str, str]) -> LazyFrame

Rename columns.

Parameters:

  • mapping (dict[str, str]) –

    Old name to new name mapping.

Returns:

  • LazyFrame

    A new LazyFrame with renamed columns.

Examples:

>>> lf = LazyFrame([{"amount": 100, "region": "EU"}])
>>> lf.rename({"amount": "price", "region": "area"}).columns
['price', 'area']

sort

sort(*by: str, ascending: bool | list[bool] = True) -> LazyFrame

Sort rows by one or more columns.

Parameters:

  • *by (str, default: () ) –

    Column names to sort by.

  • ascending (bool | list[bool], default: True ) –

    Sort direction. A single bool applies to all columns; a list specifies per-column direction.

Returns:

  • LazyFrame

    A new LazyFrame with sorted rows.

Examples:

>>> lf = LazyFrame([{"name": "C"}, {"name": "A"}, {"name": "B"}])
>>> lf.sort("name").to_pylist()
[{'name': 'A'}, {'name': 'B'}, {'name': 'C'}]

Descending sort:

>>> lf.sort("name", ascending=False).to_pylist()
[{'name': 'C'}, {'name': 'B'}, {'name': 'A'}]

join

join(other: LazyFrame, on: str | list[str] | None = None, left_on: str | list[str] | None = None, right_on: str | list[str] | None = None, how: JoinHow = 'inner', sorted: bool = False, left_cols: str | list[str] | None = None, right_cols: str | list[str] | None = None) -> LazyFrame

Join with another LazyFrame.

Parameters:

  • other (LazyFrame) –

    Right-side LazyFrame to join with.

  • on (str | list[str] | None, default: None ) –

    Column name(s) present in both sides.

  • left_on (str | list[str] | None, default: None ) –

    Column name(s) on the left side.

  • right_on (str | list[str] | None, default: None ) –

    Column name(s) on the right side.

  • how (JoinHow, default: 'inner' ) –

    Join type — 'inner', 'left', or 'full'.

  • sorted (bool, default: False ) –

    If True, use sort-merge join (O(1) memory for pre-sorted inputs) instead of hash join.

Returns:

  • LazyFrame

    A new LazyFrame with columns from both sides.

Examples:

>>> orders = LazyFrame([{"id": 1, "cust": 101}, {"id": 2, "cust": 102}])
>>> customers = LazyFrame([{"cust": 101, "name": "Alice"}])
>>> orders.join(customers, on="cust", how="left").to_pylist()
[{'id': 1, 'cust': 101, 'right_cust': 101, 'name': 'Alice'}, {'id': 2, 'cust': 102, 'right_cust': None, 'name': None}]

Different key names on each side:

>>> left = LazyFrame([{"order_id": 1, "customer_id": 10}])
>>> right = LazyFrame([{"cid": 10, "name": "Alice"}])
>>> left.join(right, left_on="customer_id", right_on="cid").to_pylist()
[{'order_id': 1, 'customer_id': 10, 'cid': 10, 'name': 'Alice'}]

group_by

group_by(*columns: str, sorted: bool = False, **legacy_kwargs) -> LazyGroupBy | LazyFrame

Group by one or more columns.

Parameters:

  • *columns (str, default: () ) –

    Column names to group by.

  • sorted (bool, default: False ) –

    If True, use streaming sorted aggregation (requires input sorted by group columns).

Returns:

Examples:

>>> orders = LazyFrame([
...     {"region": "EU", "amount": 250},
...     {"region": "EU", "amount": 180},
...     {"region": "US", "amount": 320},
... ])
>>> orders.group_by("region").agg(
...     col("amount").sum().alias("total"),
... ).sort("region").to_pylist()
[{'region': 'EU', 'total': 430}, {'region': 'US', 'total': 320}]

explode

explode(column: str) -> LazyFrame

Unnest a list column into separate rows.

Each element in the list becomes its own row, with all other column values duplicated.

Parameters:

  • column (str) –

    Name of the column containing lists.

Returns:

  • LazyFrame

    A new LazyFrame with one row per list element.

Examples:

>>> lf = LazyFrame([
...     {"id": 1, "tags": ["a", "b"]},
...     {"id": 2, "tags": ["c"]},
... ])
>>> lf.explode("tags").to_pylist()
[{'id': 1, 'tags': 'a'}, {'id': 1, 'tags': 'b'}, {'id': 2, 'tags': 'c'}]

pivot

pivot(index: str | list[str], on: str, values: str, agg: AggFunc = 'first', columns: list[str] | None = None) -> LazyFrame

Pivot (reshape long to wide).

Parameters:

  • index (str | list[str]) –

    Column(s) to keep as row identifiers.

  • on (str) –

    Column whose unique values become new column headers.

  • values (str) –

    Column whose values fill the pivoted cells.

  • agg (AggFunc, default: 'first' ) –

    Aggregation function name ('first', 'sum', etc.).

  • columns (list[str] | None, default: None ) –

    Explicit list of pivot column values (auto-detected if None).

Returns:

  • LazyFrame

    A new LazyFrame in wide format.

Examples:

>>> lf = LazyFrame([
...     {"name": "Alice", "subject": "math", "score": 90},
...     {"name": "Alice", "subject": "english", "score": 85},
...     {"name": "Bob", "subject": "math", "score": 78},
...     {"name": "Bob", "subject": "english", "score": 92},
... ])
>>> lf.pivot(index="name", on="subject", values="score",
...          columns=["math", "english"]).sort("name").to_pylist()
[{'name': 'Alice', 'math': 90, 'english': 85}, {'name': 'Bob', 'math': 78, 'english': 92}]

unpivot

unpivot(id_columns: str | list[str], value_columns: str | list[str] | None = None, variable_name: str = 'variable', value_name: str = 'value') -> LazyFrame

Unpivot (reshape wide to long). Also available as .melt().

Parameters:

  • id_columns (str | list[str]) –

    Column(s) to keep as identifiers.

  • value_columns (str | list[str] | None, default: None ) –

    Column(s) to unpivot. If None, all non-id columns.

  • variable_name (str, default: 'variable' ) –

    Name for the new column holding original column names.

  • value_name (str, default: 'value' ) –

    Name for the new column holding the values.

Returns:

  • LazyFrame

    A new LazyFrame in long format.

Examples:

>>> lf = LazyFrame([
...     {"name": "Alice", "math": 90, "english": 85},
...     {"name": "Bob", "math": 78, "english": 92},
... ])
>>> lf.unpivot("name", ["math", "english"]).sort("name", "variable").to_pylist()
[{'name': 'Alice', 'variable': 'english', 'value': 85}, {'name': 'Alice', 'variable': 'math', 'value': 90}, {'name': 'Bob', 'variable': 'english', 'value': 92}, {'name': 'Bob', 'variable': 'math', 'value': 78}]

union

union(other: LazyFrame) -> LazyFrame

Stack rows from another LazyFrame below this one.

Both LazyFrames must have the same columns.

Parameters:

Returns:

  • LazyFrame

    A new LazyFrame with rows from both inputs.

Examples:

>>> a = LazyFrame([{"x": 1}, {"x": 2}])
>>> b = LazyFrame([{"x": 3}])
>>> a.union(b).to_pylist()
[{'x': 1}, {'x': 2}, {'x': 3}]

apply

apply(func: Callable, columns: list[str] | None = None, output_dtype: type | None = None) -> LazyFrame

Apply a function to column values.

Parameters:

  • func (Callable) –

    Function to apply to each cell value.

  • columns (list[str] | None, default: None ) –

    Columns to apply to. If None, applies to all columns.

  • output_dtype (type | None, default: None ) –

    Expected output type (for schema inference).

Returns:

  • LazyFrame

    A new LazyFrame with the function applied.

Examples:

Apply to specific columns:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.apply(str, columns=["age"]).to_pylist()
[{'name': 'Alice', 'age': '30'}]

Apply to all columns:

>>> lf.apply(str).to_pylist()
[{'name': 'Alice', 'age': '30'}]

read

read(columns: str | list[str]) -> LazyFrame

Alias for :meth:select. Select columns by name.

Parameters:

  • columns (str | list[str]) –

    Column name or list of column names to select.

Returns:

  • LazyFrame

    A new LazyFrame with only the specified columns.

head

head(n: int = 5) -> LazyFrame

Return a lazy view of the first n rows.

The upstream plan is not executed until the result is materialised (e.g. via collect, to_pylist, etc.).

Parameters:

  • n (int, default: 5 ) –

    Maximum number of rows to return.

Returns:

  • LazyFrame

    A new LazyFrame with a Limit node in the plan.

Examples:

>>> lf = LazyFrame([{"x": i} for i in range(100)])
>>> lf.head(3).to_pylist()
[{'x': 0}, {'x': 1}, {'x': 2}]

optimize

optimize() -> LazyFrame

Return a new LazyFrame with an optimized query plan.

Applies filter pushdown and column pruning.

Returns:

  • LazyFrame

    A new LazyFrame wrapping the optimized plan.

Examples:

>>> lf = LazyFrame([{"a": 1, "b": 2}]).select("a").filter(col("a") > 0)
>>> opt = lf.optimize()
>>> opt.to_pylist()
[{'a': 1}]

collect

collect(optimize: bool = True) -> LazyFrame

Materialize the query plan and cache the results.

After calling collect, subsequent operations use the cached data. Calling collect multiple times is safe and idempotent.

Parameters:

  • optimize (bool, default: True ) –

    If True, run the query optimizer first.

Returns:

  • LazyFrame

    Self, with data materialized.

Examples:

>>> lf = LazyFrame([{"x": 1}, {"x": 2}]).filter(col("x") > 0)
>>> lf.is_materialized
False
>>> lf.collect()
LazyFrame [2 rows × 1 cols] (materialized)
...
>>> lf.is_materialized
True

count

count(optimize: bool = True) -> int

Return the total number of rows.

Uses fast-path counting when possible (e.g. for in-memory data) without materializing all rows.

Parameters:

  • optimize (bool, default: True ) –

    If True, run the query optimizer first.

Returns:

  • int

    The row count as an integer.

Examples:

>>> LazyFrame([{"x": 1}, {"x": 2}, {"x": 3}]).count()
3

to_pylist

to_pylist() -> list[dict]

Materialize and return data as a list of dicts.

Examples:

>>> LazyFrame([{"a": 1, "b": 2}]).to_pylist()
[{'a': 1, 'b': 2}]

to_pydict

to_pydict() -> dict[str, list]

Materialize and return data as a dict of column lists.

Examples:

>>> LazyFrame([{"x": 1, "y": "a"}, {"x": 2, "y": "b"}]).to_pydict()
{'x': [1, 2], 'y': ['a', 'b']}

to_tuples

to_tuples() -> list[tuple]

Materialize and return data as a list of tuples.

Examples:

>>> LazyFrame([{"x": 1, "y": "a"}]).to_tuples()
[(1, 'a')]

to_batches

to_batches(optimize: bool = True) -> Iterator[list[dict]]

Materialize and return data in batches of dicts.

Parameters:

  • optimize (bool, default: True ) –

    If True, run the query optimizer first.

Yields:

  • list[dict]

    Batches of rows, each represented as a list of dicts.

to_csv

to_csv(path: str, *, delimiter: str = ',', header: bool = True, encoding: str = 'utf-8') -> None

Stream the query plan to a CSV file with constant memory.

Data is written row-by-row without buffering the entire dataset, so this works for arbitrarily large pipelines.

Parameters:

  • path (str) –

    Output file path.

  • delimiter (str, default: ',' ) –

    Field delimiter character.

  • header (bool, default: True ) –

    Whether to write a header row.

  • encoding (str, default: 'utf-8' ) –

    File encoding.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.to_csv("output.csv")

to_tsv

to_tsv(path: str, **kwargs: Any) -> None

Stream the query plan to a TSV (tab-separated) file.

Equivalent to lf.to_csv(path, delimiter='\t').

Parameters:

  • path (str) –

    Output file path.

  • **kwargs (Any, default: {} ) –

    Additional arguments passed to :meth:to_csv.

to_jsonl

to_jsonl(path: str, *, encoding: str = 'utf-8') -> None

Stream the query plan to a JSON Lines file.

Parameters:

  • path (str) –

    Output file path.

  • encoding (str, default: 'utf-8' ) –

    File encoding.

to_json

to_json(path: str, *, encoding: str = 'utf-8', indent: int | None = None) -> None

Write data as a JSON array.

Parameters:

  • path (str) –

    Output file path.

  • encoding (str, default: 'utf-8' ) –

    File encoding.

  • indent (int | None, default: None ) –

    JSON indentation level.

to_parquet

to_parquet(path: str, **kwargs: Any) -> None

Write data to a Parquet file (requires pyarrow).

Parameters:

  • path (str) –

    Output file path.

  • **kwargs (Any, default: {} ) –

    Additional arguments passed to pyarrow.

display

display(n: int = 20, max_col_width: int = 30, optimize: bool = True) -> None

Print a formatted table of the first n rows.

Parameters:

  • n (int, default: 20 ) –

    Maximum number of rows to display.

  • max_col_width (int, default: 30 ) –

    Truncate cell values longer than this.

  • optimize (bool, default: True ) –

    If True, run the query optimizer first.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.display()
name  | age
------+----
Alice | 30
Bob   | 25

typed

typed(row_type: type[T]) -> TypedLazyFrame[T]

Wrap this LazyFrame as a TypedLazyFrame for IDE-friendly typed results.

Operations that preserve the schema (filter, sort, head) return a TypedLazyFrame, so .to_pylist() returns list[T] in type checkers.

Parameters:

  • row_type (type[T]) –

    A TypedDict class describing the row schema.

Returns:

  • TypedLazyFrame[T]

    A TypedLazyFrame wrapping the same query plan.

Examples:

>>> from typing import TypedDict
>>> class Order(TypedDict):
...     order_id: int
...     amount: float
>>> orders = LazyFrame([{"order_id": 1, "amount": 99.9}]).typed(Order)
>>> isinstance(orders, TypedLazyFrame)
True

validate

validate(row_type: type) -> LazyFrame

Validate the schema against a TypedDict type.

Parameters:

  • row_type (type) –

    A TypedDict class. Each key is checked against the LazyFrame's schema for presence and type compatibility.

Returns:

Raises:

  • TypeError

    If the schema doesn't match the TypedDict.

Examples:

>>> from typing import TypedDict
>>> class Order(TypedDict):
...     order_id: int
...     amount: float
>>> LazyFrame([{"order_id": 1, "amount": 9.9}]).validate(Order)
LazyFrame [1 rows × 2 cols]
...

TypedLazyFrame

The TypedLazyFrame wraps a LazyFrame with a known row type (a TypedDict), enabling static type checkers and IDEs to infer the shape of results from .to_pylist().

TypedLazyFrame

Bases: LazyFrame, Generic[T]

A LazyFrame with a known row type for static type checking.

Created via LazyFrame.typed(MyTypedDict). Operations that preserve the schema (filter, sort, head) return a TypedLazyFrame, so .to_pylist() returns list[T] in type checkers.


LazyGroupBy

The LazyGroupBy is created by calling LazyFrame.group_by(). Call .agg() on it to specify aggregation expressions and produce the grouped result.

LazyGroupBy

Builder for grouped aggregation operations.

Created by calling LazyFrame.group_by(). Use .agg() to specify aggregation expressions and produce a result LazyFrame.

Methods:

  • agg

    Apply aggregation expressions to each group.

agg

agg(*agg_exprs: AggExpr) -> LazyFrame

Apply aggregation expressions to each group.

Parameters:

  • *agg_exprs (AggExpr, default: () ) –

    One or more aggregation expressions, e.g. col("amount").sum().alias("total").

Returns:

  • LazyFrame

    A new LazyFrame with one row per group.

Raises:

  • TypeError

    If any argument is not an AggExpr.

Examples:

>>> import pyfloe as pf
>>> orders = pf.LazyFrame([
...     {"region": "EU", "amount": 250},
...     {"region": "EU", "amount": 180},
...     {"region": "US", "amount": 320},
... ])
>>> orders.group_by("region").agg(
...     pf.col("amount").sum().alias("total"),
...     pf.col("amount").count().alias("n"),
... ).sort("region").to_pylist()
[{'region': 'EU', 'total': 430, 'n': 2}, {'region': 'US', 'total': 320, 'n': 1}]