Aggregations
Group by and aggregate operations for summarizing data.
Basic Group By
import flowfile as ff
df = ff.FlowFrame({
"category": ["A", "B", "A", "B", "A"],
"value": [10, 20, 30, 40, 50],
"quantity": [1, 2, 3, 4, 5]
})
# Simple aggregation
result = df.group_by("category").agg([
ff.col("value").sum().alias("total_value"),
ff.col("value").mean().alias("avg_value"),
ff.col("quantity").count().alias("count")
])
# With description
result = df.group_by("category", description="Group by product category").agg([
ff.col("value").sum().alias("total_value")
])
Multiple Grouping Columns
result = df.group_by(["region", "category"]).agg([
ff.col("sales").sum().alias("total_sales"),
ff.col("sales").mean().alias("avg_sales")
])
Complex Group By
# Group by expression (creates polars_code node)
result = df.group_by([
ff.col("date").dt.year().alias("year")
]).agg([
ff.col("amount").sum()
])
# Dynamic aggregation
result = df.group_by("category").agg([
ff.all().sum() # Sum all numeric columns
])
Available Aggregations
Function | Description |
---|---|
sum() |
Sum of values |
mean() |
Average value |
median() |
Median value |
min() |
Minimum value |
max() |
Maximum value |
count() |
Count of non-null values |
std() |
Standard deviation |
var() |
Variance |
first() |
First value in group |
last() |
Last value in group |
list() |
Collect values into list |
Window Functions
# Running calculations
df = df.with_columns([
ff.col("value").cumsum().over("category").alias("running_total"),
ff.col("value").rank().over("category").alias("rank")
])
Node Type Selection
Simple group_by operations create UI nodes. Complex expressions in group_by create polars_code
nodes.