LazyFrame¶

The LazyFrame is the central object in pyfloe. It represents a lazy dataframe — operations build a query plan without executing it. Data flows only when you trigger evaluation with .collect(), .to_pylist(), .to_csv(), or similar methods.

LazyFrame¶

LazyFrame ¶

A lazy, composable dataframe.

Operations on a LazyFrame build a query plan without executing it. Data flows only when you call a materialization method like .collect(), .to_pylist(), or .to_csv().

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.filter(col("age") > 28).to_pylist()
[{'name': 'Alice', 'age': 30}]

Methods:

__init__ –

Create a LazyFrame from in-memory data.
explain –

Return a string representation of the query plan tree.
print_explain –

Print the query plan tree to stdout.
select –

Select columns by name or expression.
filter –

Filter rows matching a predicate expression.
with_column –

Add a computed column to the LazyFrame.
with_columns –

Add multiple computed columns at once.
drop –

Remove columns from the LazyFrame.
rename –

Rename columns.
sort –

Sort rows by one or more columns.
join –

Join with another LazyFrame.
group_by –

Group by one or more columns.
explode –

Unnest a list column into separate rows.
pivot –

Pivot (reshape long to wide).
unpivot –

Unpivot (reshape wide to long). Also available as .melt().
union –

Stack rows from another LazyFrame below this one.
apply –

Apply a function to column values.
read –

Alias for :meth:select. Select columns by name.
head –

Return a lazy view of the first n rows.
optimize –

Return a new LazyFrame with an optimized query plan.
collect –

Materialize the query plan and cache the results.
count –

Return the total number of rows.
to_pylist –

Materialize and return data as a list of dicts.
to_pydict –

Materialize and return data as a dict of column lists.
to_tuples –

Materialize and return data as a list of tuples.
to_batches –

Materialize and return data in batches of dicts.
to_csv –

Stream the query plan to a CSV file with constant memory.
to_tsv –

Stream the query plan to a TSV (tab-separated) file.
to_jsonl –

Stream the query plan to a JSON Lines file.
to_json –

Write data as a JSON array.
to_parquet –

Write data to a Parquet file (requires pyarrow).
display –

Print a formatted table of the first n rows.
typed –

Wrap this LazyFrame as a TypedLazyFrame for IDE-friendly typed results.
validate –

Validate the schema against a TypedDict type.

Attributes:

schema (LazySchema) –

Output schema of this LazyFrame, computed without touching data.
columns (list[str]) –

List of column names.
dtypes (dict[str, type]) –

Mapping of column names to their Python types.
is_materialized (bool) –

Whether the query plan has been executed and data is cached.

schema `property` ¶

schema: LazySchema

Output schema of this LazyFrame, computed without touching data.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.schema.column_names
['name', 'age']
>>> lf.schema.dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

columns `property` ¶

columns: list[str]

List of column names.

Examples:

>>> LazyFrame([{"x": 1, "y": 2}]).columns
['x', 'y']

dtypes `property` ¶

dtypes: dict[str, type]

Mapping of column names to their Python types.

Examples:

>>> LazyFrame([{"name": "Alice", "age": 30}]).dtypes
{'name': <class 'str'>, 'age': <class 'int'>}

is_materialized `property` ¶

is_materialized: bool

Whether the query plan has been executed and data is cached.

Returns True after calling :meth:collect or :meth:to_pylist.

init ¶

__init__(raw_data: list[dict] | list | dict | Iterable | None = None, *, name: str | None = None) -> None

Create a LazyFrame from in-memory data.

Parameters:

raw_data (list[dict] | list | dict | Iterable | None, default: None ) –

Input data as a list of dicts, list of tuples, list of objects with __dict__, a dict of columns, or an iterable.
name (str | None, default: None ) –

Optional name for the LazyFrame.

Examples:

From a list of dicts:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.columns
['name', 'age']

From a dict of columns:

>>> lf = LazyFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> lf.to_pylist()
[{'x': 1, 'y': 'a'}, {'x': 2, 'y': 'b'}, {'x': 3, 'y': 'c'}]

From a list of tuples (auto-named columns):

>>> lf = LazyFrame([(1, "a"), (2, "b")])
>>> lf.columns
['col_0', 'col_1']

explain ¶

explain(optimized: bool = False) -> str

Return a string representation of the query plan tree.

Parameters:

optimized (bool, default: False ) –

If True, show the plan after optimization (filter pushdown, column pruning).

Examples:

>>> lf = LazyFrame([{"a": 1}]).filter(col("a") > 0).select("a")
>>> print(lf.explain())
Project [a]
  Filter [(col("a") > 0)]
    Scan [a] (1 rows)

print_explain ¶

print_explain(optimized: bool = False) -> None

Print the query plan tree to stdout.

Shortcut for print(lf.explain()).

Parameters:

optimized (bool, default: False ) –

If True, show the plan after optimization.

select ¶

select(*args: str | Expr) -> LazyFrame

Select columns by name or expression.

Parameters:

*args (str | Expr, default: () ) –

Column names as strings, or Expr objects.

Returns:

LazyFrame –

A new LazyFrame with only the selected columns.

Examples:

Select by column name:

>>> lf = LazyFrame([{"a": 1, "b": 2, "c": 3}])
>>> lf.select("a", "c").to_pylist()
[{'a': 1, 'c': 3}]

Select with expressions:

>>> lf.select(col("a"), (col("b") + col("c")).alias("sum")).to_pylist()
[{'a': 1, 'sum': 5}]

filter ¶

filter(predicate_or_col=None, _filter=None, **kwargs) -> LazyFrame

Filter rows matching a predicate expression.

Parameters:

predicate_or_col –

An Expr that evaluates to a boolean per row, e.g. col("amount") > 100.

Returns:

LazyFrame –

A new LazyFrame with only matching rows.

Examples:

>>> orders = LazyFrame([
...     {"product": "A", "amount": 250, "region": "EU"},
...     {"product": "B", "amount": 75,  "region": "US"},
...     {"product": "C", "amount": 180, "region": "EU"},
... ])
>>> orders.filter(col("amount") > 100).to_pylist()
[{'product': 'A', 'amount': 250, 'region': 'EU'}, {'product': 'C', 'amount': 180, 'region': 'EU'}]

Compound filters with & and |:

>>> orders.filter((col("region") == "EU") & (col("amount") > 200)).to_pylist()
[{'product': 'A', 'amount': 250, 'region': 'EU'}]

with_column ¶

with_column(name_or_expr: str | Expr, expr: Expr | None = None) -> LazyFrame

Add a computed column to the LazyFrame.

Can be called with a name and expression, or with a single expression whose output name is derived via .alias() or from the underlying column reference.

Parameters:

name_or_expr (str | Expr) –

Column name (str) or an expression with an inferrable output name.
expr (Expr | None, default: None ) –

Expression to compute the column values (required when name_or_expr is a string).

Returns:

LazyFrame –

A new LazyFrame with the additional column.

Examples:

>>> lf = LazyFrame([{"price": 100}, {"price": 200}])
>>> lf.with_column("tax", col("price") * 0.2).to_pylist()
[{'price': 100, 'tax': 20.0}, {'price': 200, 'tax': 40.0}]
>>> lf.with_column((col("price") * 0.2).alias("tax")).to_pylist()
[{'price': 100, 'tax': 20.0}, {'price': 200, 'tax': 40.0}]

with_columns ¶

with_columns(*args: Expr, **kwargs: Expr) -> LazyFrame

Add multiple computed columns at once.

Accepts positional expressions (with names derived from .alias() or the underlying column) and/or keyword arguments.

Parameters:

*args (Expr, default: () ) –

Expressions with inferrable output names.
**kwargs (Expr, default: {} ) –

Column name to expression mappings.

Returns:

LazyFrame –

A new LazyFrame with the additional columns.

Examples:

>>> lf = LazyFrame([{"amount": 250, "region": "eu"}])
>>> lf.with_columns(
...     (col("amount") * 2).alias("double"),
...     col("region").str.upper().alias("upper_region"),
... ).to_pylist()
[{'amount': 250, 'region': 'eu', 'double': 500, 'upper_region': 'EU'}]
>>> lf.with_columns(
...     double=col("amount") * 2,
...     upper_region=col("region").str.upper(),
... ).to_pylist()
[{'amount': 250, 'region': 'eu', 'double': 500, 'upper_region': 'EU'}]

drop ¶

drop(*columns: str) -> LazyFrame

Remove columns from the LazyFrame.

Parameters:

*columns (str, default: () ) –

Column names to drop.

Returns:

LazyFrame –

A new LazyFrame without the specified columns.

Examples:

>>> lf = LazyFrame([{"a": 1, "b": 2, "c": 3}])
>>> lf.drop("b", "c").columns
['a']

rename ¶

rename(mapping: dict[str, str]) -> LazyFrame

Rename columns.

Parameters:

mapping (dict[str, str]) –

Old name to new name mapping.

Returns:

LazyFrame –

A new LazyFrame with renamed columns.

Examples:

>>> lf = LazyFrame([{"amount": 100, "region": "EU"}])
>>> lf.rename({"amount": "price", "region": "area"}).columns
['price', 'area']

sort ¶

sort(*by: str, ascending: bool | list[bool] = True) -> LazyFrame

Sort rows by one or more columns.

Parameters:

*by (str, default: () ) –

Column names to sort by.
ascending (bool | list[bool], default: True ) –

Sort direction. A single bool applies to all columns; a list specifies per-column direction.

Returns:

LazyFrame –

A new LazyFrame with sorted rows.

Examples:

>>> lf = LazyFrame([{"name": "C"}, {"name": "A"}, {"name": "B"}])
>>> lf.sort("name").to_pylist()
[{'name': 'A'}, {'name': 'B'}, {'name': 'C'}]

Descending sort:

>>> lf.sort("name", ascending=False).to_pylist()
[{'name': 'C'}, {'name': 'B'}, {'name': 'A'}]

join ¶

join(other: LazyFrame, on: str | list[str] | None = None, left_on: str | list[str] | None = None, right_on: str | list[str] | None = None, how: JoinHow = 'inner', sorted: bool = False, left_cols: str | list[str] | None = None, right_cols: str | list[str] | None = None) -> LazyFrame

Join with another LazyFrame.

Parameters:

other (LazyFrame) –

Right-side LazyFrame to join with.
on (str | list[str] | None, default: None ) –

Column name(s) present in both sides.
left_on (str | list[str] | None, default: None ) –

Column name(s) on the left side.
right_on (str | list[str] | None, default: None ) –

Column name(s) on the right side.
how (JoinHow, default: 'inner' ) –

Join type — 'inner', 'left', or 'full'.
sorted (bool, default: False ) –

If True, use sort-merge join (O(1) memory for pre-sorted inputs) instead of hash join.

Returns:

LazyFrame –

A new LazyFrame with columns from both sides.

Examples:

>>> orders = LazyFrame([{"id": 1, "cust": 101}, {"id": 2, "cust": 102}])
>>> customers = LazyFrame([{"cust": 101, "name": "Alice"}])
>>> orders.join(customers, on="cust", how="left").to_pylist()
[{'id': 1, 'cust': 101, 'right_cust': 101, 'name': 'Alice'}, {'id': 2, 'cust': 102, 'right_cust': None, 'name': None}]

Different key names on each side:

>>> left = LazyFrame([{"order_id": 1, "customer_id": 10}])
>>> right = LazyFrame([{"cid": 10, "name": "Alice"}])
>>> left.join(right, left_on="customer_id", right_on="cid").to_pylist()
[{'order_id': 1, 'customer_id': 10, 'cid': 10, 'name': 'Alice'}]

group_by ¶

group_by(*columns: str, sorted: bool = False, **legacy_kwargs) -> LazyGroupBy | LazyFrame

Group by one or more columns.

Parameters:

*columns (str, default: () ) –

Column names to group by.
sorted (bool, default: False ) –

If True, use streaming sorted aggregation (requires input sorted by group columns).

Returns:

LazyGroupBy | LazyFrame –

A LazyGroupBy — call .agg() to specify aggregations.

Examples:

>>> orders = LazyFrame([
...     {"region": "EU", "amount": 250},
...     {"region": "EU", "amount": 180},
...     {"region": "US", "amount": 320},
... ])
>>> orders.group_by("region").agg(
...     col("amount").sum().alias("total"),
... ).sort("region").to_pylist()
[{'region': 'EU', 'total': 430}, {'region': 'US', 'total': 320}]

explode ¶

explode(column: str) -> LazyFrame

Unnest a list column into separate rows.

Each element in the list becomes its own row, with all other column values duplicated.

Parameters:

column (str) –

Name of the column containing lists.

Returns:

LazyFrame –

A new LazyFrame with one row per list element.

Examples:

>>> lf = LazyFrame([
...     {"id": 1, "tags": ["a", "b"]},
...     {"id": 2, "tags": ["c"]},
... ])
>>> lf.explode("tags").to_pylist()
[{'id': 1, 'tags': 'a'}, {'id': 1, 'tags': 'b'}, {'id': 2, 'tags': 'c'}]

pivot ¶

pivot(index: str | list[str], on: str, values: str, agg: AggFunc = 'first', columns: list[str] | None = None) -> LazyFrame

Pivot (reshape long to wide).

Parameters:

index (str | list[str]) –

Column(s) to keep as row identifiers.
on (str) –

Column whose unique values become new column headers.
values (str) –

Column whose values fill the pivoted cells.
agg (AggFunc, default: 'first' ) –

Aggregation function name ('first', 'sum', etc.).
columns (list[str] | None, default: None ) –

Explicit list of pivot column values (auto-detected if None).

Returns:

LazyFrame –

A new LazyFrame in wide format.

Examples:

>>> lf = LazyFrame([
...     {"name": "Alice", "subject": "math", "score": 90},
...     {"name": "Alice", "subject": "english", "score": 85},
...     {"name": "Bob", "subject": "math", "score": 78},
...     {"name": "Bob", "subject": "english", "score": 92},
... ])
>>> lf.pivot(index="name", on="subject", values="score",
...          columns=["math", "english"]).sort("name").to_pylist()
[{'name': 'Alice', 'math': 90, 'english': 85}, {'name': 'Bob', 'math': 78, 'english': 92}]

unpivot ¶

unpivot(id_columns: str | list[str], value_columns: str | list[str] | None = None, variable_name: str = 'variable', value_name: str = 'value') -> LazyFrame

Unpivot (reshape wide to long). Also available as .melt().

Parameters:

id_columns (str | list[str]) –

Column(s) to keep as identifiers.
value_columns (str | list[str] | None, default: None ) –

Column(s) to unpivot. If None, all non-id columns.
variable_name (str, default: 'variable' ) –

Name for the new column holding original column names.
value_name (str, default: 'value' ) –

Name for the new column holding the values.

Returns:

LazyFrame –

A new LazyFrame in long format.

Examples:

>>> lf = LazyFrame([
...     {"name": "Alice", "math": 90, "english": 85},
...     {"name": "Bob", "math": 78, "english": 92},
... ])
>>> lf.unpivot("name", ["math", "english"]).sort("name", "variable").to_pylist()
[{'name': 'Alice', 'variable': 'english', 'value': 85}, {'name': 'Alice', 'variable': 'math', 'value': 90}, {'name': 'Bob', 'variable': 'english', 'value': 92}, {'name': 'Bob', 'variable': 'math', 'value': 78}]

union ¶

union(other: LazyFrame) -> LazyFrame

Stack rows from another LazyFrame below this one.

Both LazyFrames must have the same columns.

Parameters:

other (LazyFrame) –

LazyFrame to append.

Returns:

LazyFrame –

A new LazyFrame with rows from both inputs.

Examples:

>>> a = LazyFrame([{"x": 1}, {"x": 2}])
>>> b = LazyFrame([{"x": 3}])
>>> a.union(b).to_pylist()
[{'x': 1}, {'x': 2}, {'x': 3}]

apply ¶

apply(func: Callable, columns: list[str] | None = None, output_dtype: type | None = None) -> LazyFrame

Apply a function to column values.

Parameters:

func (Callable) –

Function to apply to each cell value.
columns (list[str] | None, default: None ) –

Columns to apply to. If None, applies to all columns.
output_dtype (type | None, default: None ) –

Expected output type (for schema inference).

Returns:

LazyFrame –

A new LazyFrame with the function applied.

Examples:

Apply to specific columns:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.apply(str, columns=["age"]).to_pylist()
[{'name': 'Alice', 'age': '30'}]

Apply to all columns:

>>> lf.apply(str).to_pylist()
[{'name': 'Alice', 'age': '30'}]

read ¶

read(columns: str | list[str]) -> LazyFrame

Alias for :meth:select. Select columns by name.

Parameters:

columns (str | list[str]) –

Column name or list of column names to select.

Returns:

LazyFrame –

A new LazyFrame with only the specified columns.

head ¶

head(n: int = 5) -> LazyFrame

Return a lazy view of the first n rows.

The upstream plan is not executed until the result is materialised (e.g. via collect, to_pylist, etc.).

Parameters:

n (int, default: 5 ) –

Maximum number of rows to return.

Returns:

LazyFrame –

A new LazyFrame with a Limit node in the plan.

Examples:

>>> lf = LazyFrame([{"x": i} for i in range(100)])
>>> lf.head(3).to_pylist()
[{'x': 0}, {'x': 1}, {'x': 2}]

optimize ¶

optimize() -> LazyFrame

Return a new LazyFrame with an optimized query plan.

Applies filter pushdown and column pruning.

Returns:

LazyFrame –

A new LazyFrame wrapping the optimized plan.

Examples:

>>> lf = LazyFrame([{"a": 1, "b": 2}]).select("a").filter(col("a") > 0)
>>> opt = lf.optimize()
>>> opt.to_pylist()
[{'a': 1}]

collect ¶

collect(optimize: bool = True) -> LazyFrame

Materialize the query plan and cache the results.

After calling collect, subsequent operations use the cached data. Calling collect multiple times is safe and idempotent.

Parameters:

optimize (bool, default: True ) –

If True, run the query optimizer first.

Returns:

LazyFrame –

Self, with data materialized.

Examples:

>>> lf = LazyFrame([{"x": 1}, {"x": 2}]).filter(col("x") > 0)
>>> lf.is_materialized
False
>>> lf.collect()
LazyFrame [2 rows × 1 cols] (materialized)
...
>>> lf.is_materialized
True

count ¶

count(optimize: bool = True) -> int

Return the total number of rows.

Uses fast-path counting when possible (e.g. for in-memory data) without materializing all rows.

Parameters:

optimize (bool, default: True ) –

If True, run the query optimizer first.

Returns:

int –

The row count as an integer.

Examples:

>>> LazyFrame([{"x": 1}, {"x": 2}, {"x": 3}]).count()
3

to_pylist ¶

to_pylist() -> list[dict]

Materialize and return data as a list of dicts.

Examples:

>>> LazyFrame([{"a": 1, "b": 2}]).to_pylist()
[{'a': 1, 'b': 2}]

to_pydict ¶

to_pydict() -> dict[str, list]

Materialize and return data as a dict of column lists.

Examples:

>>> LazyFrame([{"x": 1, "y": "a"}, {"x": 2, "y": "b"}]).to_pydict()
{'x': [1, 2], 'y': ['a', 'b']}

to_tuples ¶

to_tuples() -> list[tuple]

Materialize and return data as a list of tuples.

Examples:

>>> LazyFrame([{"x": 1, "y": "a"}]).to_tuples()
[(1, 'a')]

to_batches ¶

to_batches(optimize: bool = True) -> Iterator[list[dict]]

Materialize and return data in batches of dicts.

Parameters:

optimize (bool, default: True ) –

If True, run the query optimizer first.

Yields:

list[dict] –

Batches of rows, each represented as a list of dicts.

to_csv ¶

to_csv(path: str, *, delimiter: str = ',', header: bool = True, encoding: str = 'utf-8') -> None

Stream the query plan to a CSV file with constant memory.

Data is written row-by-row without buffering the entire dataset, so this works for arbitrarily large pipelines.

Parameters:

path (str) –

Output file path.
delimiter (str, default: ',' ) –

Field delimiter character.
header (bool, default: True ) –

Whether to write a header row.
encoding (str, default: 'utf-8' ) –

File encoding.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}])
>>> lf.to_csv("output.csv")

to_tsv ¶

to_tsv(path: str, **kwargs: Any) -> None

Stream the query plan to a TSV (tab-separated) file.

Equivalent to lf.to_csv(path, delimiter='\t').

Parameters:

path (str) –

Output file path.
**kwargs (Any, default: {} ) –

Additional arguments passed to :meth:to_csv.

to_jsonl ¶

to_jsonl(path: str, *, encoding: str = 'utf-8') -> None

Stream the query plan to a JSON Lines file.

Parameters:

path (str) –

Output file path.
encoding (str, default: 'utf-8' ) –

File encoding.

to_json ¶

to_json(path: str, *, encoding: str = 'utf-8', indent: int | None = None) -> None

Write data as a JSON array.

Parameters:

path (str) –

Output file path.
encoding (str, default: 'utf-8' ) –

File encoding.
indent (int | None, default: None ) –

JSON indentation level.

to_parquet ¶

to_parquet(path: str, **kwargs: Any) -> None

Write data to a Parquet file (requires pyarrow).

Parameters:

path (str) –

Output file path.
**kwargs (Any, default: {} ) –

Additional arguments passed to pyarrow.

display ¶

display(n: int = 20, max_col_width: int = 30, optimize: bool = True) -> None

Print a formatted table of the first n rows.

Parameters:

n (int, default: 20 ) –

Maximum number of rows to display.
max_col_width (int, default: 30 ) –

Truncate cell values longer than this.
optimize (bool, default: True ) –

If True, run the query optimizer first.

Examples:

>>> lf = LazyFrame([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
>>> lf.display()
name  | age
------+----
Alice | 30
Bob   | 25

typed ¶

typed(row_type: type[T]) -> TypedLazyFrame[T]

Wrap this LazyFrame as a TypedLazyFrame for IDE-friendly typed results.

Operations that preserve the schema (filter, sort, head) return a TypedLazyFrame, so .to_pylist() returns list[T] in type checkers.

Parameters:

row_type (type[T]) –

A TypedDict class describing the row schema.

Returns:

TypedLazyFrame[T] –

A TypedLazyFrame wrapping the same query plan.

Examples:

>>> from typing import TypedDict
>>> class Order(TypedDict):
...     order_id: int
...     amount: float
>>> orders = LazyFrame([{"order_id": 1, "amount": 99.9}]).typed(Order)
>>> isinstance(orders, TypedLazyFrame)
True

validate ¶

validate(row_type: type) -> LazyFrame

Validate the schema against a TypedDict type.

Parameters:

row_type (type) –

A TypedDict class. Each key is checked against the LazyFrame's schema for presence and type compatibility.

Returns:

LazyFrame –

Self, if validation passes.

Raises:

TypeError –

If the schema doesn't match the TypedDict.

Examples:

>>> from typing import TypedDict
>>> class Order(TypedDict):
...     order_id: int
...     amount: float
>>> LazyFrame([{"order_id": 1, "amount": 9.9}]).validate(Order)
LazyFrame [1 rows × 2 cols]
...

TypedLazyFrame¶

The TypedLazyFrame wraps a LazyFrame with a known row type (a TypedDict), enabling static type checkers and IDEs to infer the shape of results from .to_pylist().

TypedLazyFrame ¶

Bases: LazyFrame, Generic[T]

A LazyFrame with a known row type for static type checking.

Created via LazyFrame.typed(MyTypedDict). Operations that preserve the schema (filter, sort, head) return a TypedLazyFrame, so .to_pylist() returns list[T] in type checkers.

LazyGroupBy¶

The LazyGroupBy is created by calling LazyFrame.group_by(). Call .agg() on it to specify aggregation expressions and produce the grouped result.

LazyGroupBy ¶

Builder for grouped aggregation operations.

Created by calling LazyFrame.group_by(). Use .agg() to specify aggregation expressions and produce a result LazyFrame.

Methods:

agg –

Apply aggregation expressions to each group.

agg ¶

agg(*agg_exprs: AggExpr) -> LazyFrame

Apply aggregation expressions to each group.

Parameters:

*agg_exprs (AggExpr, default: () ) –

One or more aggregation expressions, e.g. col("amount").sum().alias("total").

Returns:

LazyFrame –

A new LazyFrame with one row per group.

Raises:

TypeError –

If any argument is not an AggExpr.

Examples:

>>> import pyfloe as pf
>>> orders = pf.LazyFrame([
...     {"region": "EU", "amount": 250},
...     {"region": "EU", "amount": 180},
...     {"region": "US", "amount": 320},
... ])
>>> orders.group_by("region").agg(
...     pf.col("amount").sum().alias("total"),
...     pf.col("amount").count().alias("n"),
... ).sort("region").to_pylist()
[{'region': 'EU', 'total': 430, 'n': 2}, {'region': 'US', 'total': 320, 'n': 1}]

LazyFrame¶

LazyFrame¶

LazyFrame ¶

schema property ¶

columns property ¶

dtypes property ¶

is_materialized property ¶

__init__ ¶

explain ¶

print_explain ¶

select ¶

filter ¶

with_column ¶

with_columns ¶

drop ¶

rename ¶

sort ¶

join ¶

group_by ¶

explode ¶

pivot ¶

unpivot ¶

union ¶

apply ¶

read ¶

head ¶

optimize ¶

collect ¶

count ¶

to_pylist ¶

to_pydict ¶

to_tuples ¶

to_batches ¶

to_csv ¶

to_tsv ¶

to_jsonl ¶

to_json ¶

to_parquet ¶

display ¶

typed ¶

validate ¶

TypedLazyFrame¶

TypedLazyFrame ¶

LazyGroupBy¶

LazyGroupBy ¶

agg ¶

schema `property` ¶

columns `property` ¶

dtypes `property` ¶

is_materialized `property` ¶

init ¶