Expressions¶
Expressions are composable AST nodes that represent computations on column values. They are the building blocks for filters, computed columns, aggregations, and window functions.
Constructor Functions¶
These are the primary entry points for creating expressions.
col¶
col
¶
Create a column reference expression.
This is the primary way to reference columns in filter, with_column, and aggregation expressions.
Parameters:
-
name(str) –The column name to reference.
Examples:
lit¶
lit
¶
when¶
when
¶
Begin a conditional expression (SQL CASE WHEN).
Chain with .when() for additional branches and .otherwise()
for the default value.
Parameters:
-
condition(Expr) –Boolean expression for the first branch.
-
then_val(Any) –Value to return when condition is true.
Returns:
-
WhenExpr–A WhenExpr that can be chained with
.when()and.otherwise().
Examples:
>>> import pyfloe as pf
>>> lf = pf.LazyFrame([{"amount": 250}, {"amount": 75}, {"amount": 150}])
>>> lf.with_column("size",
... pf.when(pf.col("amount") > 200, "large")
... .when(pf.col("amount") > 100, "medium")
... .otherwise("small")
... ).to_pylist()
[{'amount': 250, 'size': 'large'}, {'amount': 75, 'size': 'small'}, {'amount': 150, 'size': 'medium'}]
Window Functions¶
row_number¶
row_number
¶
Create a row_number window function expression.
Assigns a sequential integer to each row within its partition, starting at 1.
Examples:
>>> import pyfloe as pf
>>> data = [{"region": "EU", "amount": 100}, {"region": "EU", "amount": 200}]
>>> pf.LazyFrame(data).with_column("rn",
... pf.row_number().over(partition_by="region", order_by="amount")
... ).to_pylist()
[{'region': 'EU', 'amount': 100, 'rn': 1}, {'region': 'EU', 'amount': 200, 'rn': 2}]
rank¶
rank
¶
Create a rank window function expression.
Equal values receive the same rank with gaps (e.g. 1, 2, 2, 4).
Use with .over() to specify partitioning and ordering.
Examples:
>>> import pyfloe as pf
>>> data = [{"name": "a", "score": 10}, {"name": "b", "score": 20},
... {"name": "c", "score": 20}, {"name": "d", "score": 30}]
>>> pf.LazyFrame(data).with_column("r", pf.rank().over(order_by="score")).to_pylist()
[{'name': 'a', 'score': 10, 'r': 1}, {'name': 'b', 'score': 20, 'r': 2}, {'name': 'c', 'score': 20, 'r': 2}, {'name': 'd', 'score': 30, 'r': 4}]
dense_rank¶
dense_rank
¶
Create a dense_rank window function expression.
Like :func:rank but without gaps in the ranking sequence
(e.g. 1, 2, 2, 3).
Examples:
>>> import pyfloe as pf
>>> data = [{"name": "a", "score": 10}, {"name": "b", "score": 20},
... {"name": "c", "score": 20}, {"name": "d", "score": 30}]
>>> pf.LazyFrame(data).with_column("dr", pf.dense_rank().over(order_by="score")).to_pylist()
[{'name': 'a', 'score': 10, 'dr': 1}, {'name': 'b', 'score': 20, 'dr': 2}, {'name': 'c', 'score': 20, 'dr': 2}, {'name': 'd', 'score': 30, 'dr': 3}]
Expression Classes¶
Expr¶
The base class for all expression types. Supports arithmetic (+, -, *, /), comparisons (>, <, ==, etc.), logical operators (&, |, ~), and method chaining.
Expr
¶
Base class for all expressions in the query plan.
Expressions are composable AST nodes that represent computations on column values. They support arithmetic operators, comparisons, logical operators, and method chaining for aggregations, window functions, string operations, and datetime operations.
Examples:
Arithmetic:
>>> LazyFrame([{"price": 100}]).with_column("sale", col("price") * 0.9).to_pylist()
[{'price': 100, 'sale': 90.0}]
Comparisons:
Logical operators:
>>> data = [{"age": 20, "active": True}, {"age": 20, "active": False}]
>>> LazyFrame(data).filter((col("age") > 18) & (col("active") == True)).to_pylist()
[{'age': 20, 'active': True}]
Methods:
-
alias–Rename the output column of this expression.
-
cast–Cast the expression value to a different type.
-
is_null–Test whether the value is None.
-
is_not_null–Test whether the value is not None.
-
is_in–Test whether the value is in a set of values.
-
sum–Sum of non-null values. Use inside
group_by().agg()or with.over(). -
mean–Mean of non-null values. Use inside
group_by().agg()or with.over(). -
min–Minimum of non-null values. Use inside
group_by().agg()or with.over(). -
max–Maximum of non-null values. Use inside
group_by().agg()or with.over(). -
count–Count of non-null values. Use inside
group_by().agg()or with.over(). -
first–First non-null value in the group. Use inside
group_by().agg(). -
last–Last non-null value in the group. Use inside
group_by().agg(). -
n_unique–Count of distinct non-null values. Use inside
group_by().agg(). -
cumsum–Cumulative sum. Use with
.over()to create a window expression. -
cummax–Cumulative maximum. Use with
.over()to create a window expression. -
cummin–Cumulative minimum. Use with
.over()to create a window expression. -
lag–Access the value from n rows before in the window.
-
lead–Access the value from n rows ahead in the window.
Attributes:
-
str(StringAccessor) –Access string methods on this expression.
-
dt(DateTimeAccessor) –Access datetime methods on this expression.
str
property
¶
Access string methods on this expression.
Returns:
-
A(StringAccessor) –class:
StringAccessorproviding.upper(),.lower(), -
StringAccessor–.contains(),.replace(), etc.
Examples:
dt
property
¶
Access datetime methods on this expression.
Returns:
-
A(DateTimeAccessor) –class:
DateTimeAccessorproviding.year(),.month(), -
DateTimeAccessor–.truncate(),.add_days(), etc.
Examples:
alias
¶
cast
¶
is_null
¶
is_not_null
¶
is_in
¶
Test whether the value is in a set of values.
Parameters:
-
values(Any) –Collection of values to test membership against.
Returns:
-
UnaryExpr–A boolean expression.
Examples:
sum
¶
mean
¶
min
¶
max
¶
count
¶
first
¶
last
¶
n_unique
¶
cumsum
¶
Cumulative sum. Use with .over() to create a window expression.
Examples:
cummax
¶
Cumulative maximum. Use with .over() to create a window expression.
Examples:
cummin
¶
Cumulative minimum. Use with .over() to create a window expression.
Examples:
lag
¶
Access the value from n rows before in the window.
Must be used with .over() to specify ordering.
Parameters:
-
n(int, default:1) –Number of rows to look back.
-
default(Any, default:None) –Value to use when there is no previous row.
Examples:
lead
¶
Access the value from n rows ahead in the window.
Must be used with .over() to specify ordering.
Parameters:
-
n(int, default:1) –Number of rows to look ahead.
-
default(Any, default:None) –Value to use when there is no subsequent row.
Examples:
Col¶
Col
¶
Bases: Expr
A column reference expression.
Evaluates to the value of the named column in each row.
Attributes:
-
name–The column name this expression refers to.