Skip to content

Expressions

Expressions are composable AST nodes that represent computations on column values. They are the building blocks for filters, computed columns, aggregations, and window functions.


Constructor Functions

These are the primary entry points for creating expressions.

col

col

col(name: str) -> Col

Create a column reference expression.

This is the primary way to reference columns in filter, with_column, and aggregation expressions.

Parameters:

  • name (str) –

    The column name to reference.

Examples:

>>> import pyfloe as pf
>>> lf = pf.LazyFrame([{"x": 10, "y": 20}])
>>> lf.with_column("total", pf.col("x") + pf.col("y")).to_pylist()
[{'x': 10, 'y': 20, 'total': 30}]

lit

lit

lit(value: Any) -> Lit

Create a literal value expression.

Wraps a Python value as an expression for use in computations.

Parameters:

  • value (Any) –

    The constant value.

Examples:

>>> import pyfloe as pf
>>> lf = pf.LazyFrame([{"x": 10}])
>>> lf.filter(pf.col("x") > pf.lit(5)).to_pylist()
[{'x': 10}]

when

when

when(condition: Expr, then_val: Any) -> WhenExpr

Begin a conditional expression (SQL CASE WHEN).

Chain with .when() for additional branches and .otherwise() for the default value.

Parameters:

  • condition (Expr) –

    Boolean expression for the first branch.

  • then_val (Any) –

    Value to return when condition is true.

Returns:

  • WhenExpr

    A WhenExpr that can be chained with .when() and .otherwise().

Examples:

>>> import pyfloe as pf
>>> lf = pf.LazyFrame([{"amount": 250}, {"amount": 75}, {"amount": 150}])
>>> lf.with_column("size",
...     pf.when(pf.col("amount") > 200, "large")
...     .when(pf.col("amount") > 100, "medium")
...     .otherwise("small")
... ).to_pylist()
[{'amount': 250, 'size': 'large'}, {'amount': 75, 'size': 'small'}, {'amount': 150, 'size': 'medium'}]

Window Functions

row_number

row_number

row_number() -> RankExpr

Create a row_number window function expression.

Assigns a sequential integer to each row within its partition, starting at 1.

Examples:

>>> import pyfloe as pf
>>> data = [{"region": "EU", "amount": 100}, {"region": "EU", "amount": 200}]
>>> pf.LazyFrame(data).with_column("rn",
...     pf.row_number().over(partition_by="region", order_by="amount")
... ).to_pylist()
[{'region': 'EU', 'amount': 100, 'rn': 1}, {'region': 'EU', 'amount': 200, 'rn': 2}]

rank

rank

rank() -> RankExpr

Create a rank window function expression.

Equal values receive the same rank with gaps (e.g. 1, 2, 2, 4). Use with .over() to specify partitioning and ordering.

Examples:

>>> import pyfloe as pf
>>> data = [{"name": "a", "score": 10}, {"name": "b", "score": 20},
...         {"name": "c", "score": 20}, {"name": "d", "score": 30}]
>>> pf.LazyFrame(data).with_column("r", pf.rank().over(order_by="score")).to_pylist()
[{'name': 'a', 'score': 10, 'r': 1}, {'name': 'b', 'score': 20, 'r': 2}, {'name': 'c', 'score': 20, 'r': 2}, {'name': 'd', 'score': 30, 'r': 4}]

dense_rank

dense_rank

dense_rank() -> RankExpr

Create a dense_rank window function expression.

Like :func:rank but without gaps in the ranking sequence (e.g. 1, 2, 2, 3).

Examples:

>>> import pyfloe as pf
>>> data = [{"name": "a", "score": 10}, {"name": "b", "score": 20},
...         {"name": "c", "score": 20}, {"name": "d", "score": 30}]
>>> pf.LazyFrame(data).with_column("dr", pf.dense_rank().over(order_by="score")).to_pylist()
[{'name': 'a', 'score': 10, 'dr': 1}, {'name': 'b', 'score': 20, 'dr': 2}, {'name': 'c', 'score': 20, 'dr': 2}, {'name': 'd', 'score': 30, 'dr': 3}]

Expression Classes

Expr

The base class for all expression types. Supports arithmetic (+, -, *, /), comparisons (>, <, ==, etc.), logical operators (&, |, ~), and method chaining.

Expr

Base class for all expressions in the query plan.

Expressions are composable AST nodes that represent computations on column values. They support arithmetic operators, comparisons, logical operators, and method chaining for aggregations, window functions, string operations, and datetime operations.

Examples:

Arithmetic:

>>> LazyFrame([{"price": 100}]).with_column("sale", col("price") * 0.9).to_pylist()
[{'price': 100, 'sale': 90.0}]

Comparisons:

>>> LazyFrame([{"age": 20}, {"age": 15}]).filter(col("age") > 18).to_pylist()
[{'age': 20}]

Logical operators:

>>> data = [{"age": 20, "active": True}, {"age": 20, "active": False}]
>>> LazyFrame(data).filter((col("age") > 18) & (col("active") == True)).to_pylist()
[{'age': 20, 'active': True}]

Methods:

  • alias

    Rename the output column of this expression.

  • cast

    Cast the expression value to a different type.

  • is_null

    Test whether the value is None.

  • is_not_null

    Test whether the value is not None.

  • is_in

    Test whether the value is in a set of values.

  • sum

    Sum of non-null values. Use inside group_by().agg() or with .over().

  • mean

    Mean of non-null values. Use inside group_by().agg() or with .over().

  • min

    Minimum of non-null values. Use inside group_by().agg() or with .over().

  • max

    Maximum of non-null values. Use inside group_by().agg() or with .over().

  • count

    Count of non-null values. Use inside group_by().agg() or with .over().

  • first

    First non-null value in the group. Use inside group_by().agg().

  • last

    Last non-null value in the group. Use inside group_by().agg().

  • n_unique

    Count of distinct non-null values. Use inside group_by().agg().

  • cumsum

    Cumulative sum. Use with .over() to create a window expression.

  • cummax

    Cumulative maximum. Use with .over() to create a window expression.

  • cummin

    Cumulative minimum. Use with .over() to create a window expression.

  • lag

    Access the value from n rows before in the window.

  • lead

    Access the value from n rows ahead in the window.

Attributes:

  • str (StringAccessor) –

    Access string methods on this expression.

  • dt (DateTimeAccessor) –

    Access datetime methods on this expression.

str property
str: StringAccessor

Access string methods on this expression.

Returns:

  • A ( StringAccessor ) –

    class:StringAccessor providing .upper(), .lower(),

  • StringAccessor

    .contains(), .replace(), etc.

Examples:

>>> LazyFrame([{"name": "alice"}]).with_column("u", col("name").str.upper()).to_pylist()
[{'name': 'alice', 'u': 'ALICE'}]
dt property
dt: DateTimeAccessor

Access datetime methods on this expression.

Returns:

  • A ( DateTimeAccessor ) –

    class:DateTimeAccessor providing .year(), .month(),

  • DateTimeAccessor

    .truncate(), .add_days(), etc.

Examples:

>>> from datetime import datetime
>>> lf = LazyFrame([{"ts": datetime(2024, 1, 15)}])
>>> lf.with_column("y", col("ts").dt.year()).to_pylist()
[{'ts': datetime.datetime(2024, 1, 15, 0, 0), 'y': 2024}]
alias
alias(name: str) -> Expr

Rename the output column of this expression.

Parameters:

  • name (str) –

    New column name.

Examples:

>>> LazyFrame([{"price": 100}]).with_column((col("price") * 0.2).alias("tax")).to_pylist()
[{'price': 100, 'tax': 20.0}]
cast
cast(dtype: type) -> CastExpr

Cast the expression value to a different type.

Parameters:

  • dtype (type) –

    Target Python type (e.g. int, str, float).

Examples:

>>> LazyFrame([{"amount": 42}]).with_column("s", col("amount").cast(str)).to_pylist()
[{'amount': 42, 's': '42'}]
is_null
is_null() -> UnaryExpr

Test whether the value is None.

Returns:

  • UnaryExpr

    A boolean expression that is True for None values.

Examples:

>>> lf = LazyFrame([{"x": 1}, {"x": None}])
>>> lf.filter(col("x").is_null()).to_pylist()
[{'x': None}]
is_not_null
is_not_null() -> UnaryExpr

Test whether the value is not None.

Returns:

  • UnaryExpr

    A boolean expression that is True for non-None values.

Examples:

>>> lf = LazyFrame([{"x": 1}, {"x": None}])
>>> lf.filter(col("x").is_not_null()).to_pylist()
[{'x': 1}]
is_in
is_in(values: Any) -> UnaryExpr

Test whether the value is in a set of values.

Parameters:

  • values (Any) –

    Collection of values to test membership against.

Returns:

  • UnaryExpr

    A boolean expression.

Examples:

>>> lf = LazyFrame([{"r": "EU"}, {"r": "US"}, {"r": "AP"}])
>>> lf.filter(col("r").is_in(["EU", "US"])).to_pylist()
[{'r': 'EU'}, {'r': 'US'}]
sum
sum() -> AggExpr

Sum of non-null values. Use inside group_by().agg() or with .over().

Examples:

>>> data = [{"g": "a", "amount": 10}, {"g": "a", "amount": 20}]
>>> LazyFrame(data).group_by("g").agg(col("amount").sum().alias("total")).to_pylist()
[{'g': 'a', 'total': 30}]
mean
mean() -> AggExpr

Mean of non-null values. Use inside group_by().agg() or with .over().

Examples:

>>> data = [{"g": "a", "score": 80}, {"g": "a", "score": 100}]
>>> LazyFrame(data).group_by("g").agg(col("score").mean().alias("avg")).to_pylist()
[{'g': 'a', 'avg': 90.0}]
min
min() -> AggExpr

Minimum of non-null values. Use inside group_by().agg() or with .over().

Examples:

>>> data = [{"g": "a", "price": 30}, {"g": "a", "price": 10}]
>>> LazyFrame(data).group_by("g").agg(col("price").min().alias("lowest")).to_pylist()
[{'g': 'a', 'lowest': 10}]
max
max() -> AggExpr

Maximum of non-null values. Use inside group_by().agg() or with .over().

Examples:

>>> data = [{"g": "a", "price": 30}, {"g": "a", "price": 10}]
>>> LazyFrame(data).group_by("g").agg(col("price").max().alias("highest")).to_pylist()
[{'g': 'a', 'highest': 30}]
count
count() -> AggExpr

Count of non-null values. Use inside group_by().agg() or with .over().

Examples:

>>> data = [{"g": "a", "order_id": 1}, {"g": "a", "order_id": 2}]
>>> LazyFrame(data).group_by("g").agg(col("order_id").count().alias("n")).to_pylist()
[{'g': 'a', 'n': 2}]
first
first() -> AggExpr

First non-null value in the group. Use inside group_by().agg().

Examples:

>>> data = [{"g": "a", "name": "Alice"}, {"g": "a", "name": "Bob"}]
>>> LazyFrame(data).group_by("g").agg(col("name").first().alias("first_name")).to_pylist()
[{'g': 'a', 'first_name': 'Alice'}]
last
last() -> AggExpr

Last non-null value in the group. Use inside group_by().agg().

Examples:

>>> data = [{"g": "a", "name": "Alice"}, {"g": "a", "name": "Bob"}]
>>> LazyFrame(data).group_by("g").agg(col("name").last().alias("last_name")).to_pylist()
[{'g': 'a', 'last_name': 'Bob'}]
n_unique
n_unique() -> AggExpr

Count of distinct non-null values. Use inside group_by().agg().

Examples:

>>> data = [{"g": "a", "product": "x"}, {"g": "a", "product": "x"}, {"g": "a", "product": "y"}]
>>> LazyFrame(data).group_by("g").agg(col("product").n_unique().alias("u")).to_pylist()
[{'g': 'a', 'u': 2}]
cumsum
cumsum() -> CumExpr

Cumulative sum. Use with .over() to create a window expression.

Examples:

>>> data = [{"date": 1, "amount": 10}, {"date": 2, "amount": 20}]
>>> LazyFrame(data).with_column("cs", col("amount").cumsum().over(order_by="date")).to_pylist()
[{'date': 1, 'amount': 10, 'cs': 10}, {'date': 2, 'amount': 20, 'cs': 30}]
cummax
cummax() -> CumExpr

Cumulative maximum. Use with .over() to create a window expression.

Examples:

>>> data = [{"round": 1, "score": 5}, {"round": 2, "score": 3}, {"round": 3, "score": 8}]
>>> LazyFrame(data).with_column("cm", col("score").cummax().over(order_by="round")).to_pylist()
[{'round': 1, 'score': 5, 'cm': 5}, {'round': 2, 'score': 3, 'cm': 5}, {'round': 3, 'score': 8, 'cm': 8}]
cummin
cummin() -> CumExpr

Cumulative minimum. Use with .over() to create a window expression.

Examples:

>>> data = [{"round": 1, "score": 5}, {"round": 2, "score": 3}, {"round": 3, "score": 8}]
>>> LazyFrame(data).with_column("cm", col("score").cummin().over(order_by="round")).to_pylist()
[{'round': 1, 'score': 5, 'cm': 5}, {'round': 2, 'score': 3, 'cm': 3}, {'round': 3, 'score': 8, 'cm': 3}]
lag
lag(n: int = 1, default: Any = None) -> OffsetExpr

Access the value from n rows before in the window.

Must be used with .over() to specify ordering.

Parameters:

  • n (int, default: 1 ) –

    Number of rows to look back.

  • default (Any, default: None ) –

    Value to use when there is no previous row.

Examples:

>>> data = [{"id": 1, "value": 10}, {"id": 2, "value": 20}, {"id": 3, "value": 30}]
>>> LazyFrame(data).with_column("prev", col("value").lag(1, default=0).over(order_by="id")).to_pylist()
[{'id': 1, 'value': 10, 'prev': 0}, {'id': 2, 'value': 20, 'prev': 10}, {'id': 3, 'value': 30, 'prev': 20}]
lead
lead(n: int = 1, default: Any = None) -> OffsetExpr

Access the value from n rows ahead in the window.

Must be used with .over() to specify ordering.

Parameters:

  • n (int, default: 1 ) –

    Number of rows to look ahead.

  • default (Any, default: None ) –

    Value to use when there is no subsequent row.

Examples:

>>> data = [{"id": 1, "value": 10}, {"id": 2, "value": 20}, {"id": 3, "value": 30}]
>>> LazyFrame(data).with_column("next", col("value").lead(1, default=0).over(order_by="id")).to_pylist()
[{'id': 1, 'value': 10, 'next': 20}, {'id': 2, 'value': 20, 'next': 30}, {'id': 3, 'value': 30, 'next': 0}]

Col

Col

Bases: Expr

A column reference expression.

Evaluates to the value of the named column in each row.

Attributes:

  • name

    The column name this expression refers to.

Lit

Lit

Bases: Expr

A literal value expression.

Always evaluates to the same constant value, regardless of the row.

Attributes:

  • value

    The constant value.