API Reference
Overview of the pyfloe public API, organized by topic.
Quick reference
Constructors
| Function |
Description |
LazyFrame(data) |
From list of dicts or objects |
read_csv(path, ...) |
Lazy CSV reader (auto-detects datetime) |
read_tsv(path, ...) |
Lazy TSV reader |
read_jsonl(path, ...) |
Lazy JSON Lines reader |
read_json(path, ...) |
JSON array reader |
read_fixed_width(path, widths, ...) |
Lazy fixed-width reader |
read_parquet(path, ...) |
Lazy Parquet reader (requires pyarrow) |
from_iter(source, ...) |
From any iterator/generator |
from_chunks(chunks, ...) |
From batched/paginated source |
Stream.from_iter(source, ...) |
True streaming pipeline |
Stream.from_csv(path, ...) |
Stream from CSV |
LazyFrame methods
| Method |
Lazy? |
Description |
.select(*cols) |
✓ |
Select columns or expressions |
.filter(expr) |
✓ |
Filter rows |
.with_column(name, expr) |
✓ |
Add computed column |
.with_columns(**exprs) |
✓ |
Add multiple columns |
.drop(*cols) |
✓ |
Drop columns |
.rename(mapping) |
✓ |
Rename columns |
.sort(*cols) |
✗ |
Sort (Timsort) |
.join(other, on=) |
✓ |
Hash join (or sort-merge with sorted=True) |
.union(other) |
✓ |
Stack rows |
.explode(col) |
✓ |
Unnest lists |
.apply(func) |
✓ |
Apply to columns |
.group_by(*cols).agg(...) |
✗ |
Hash agg (or streaming with sorted=True) |
.head(n) |
partial |
First n rows |
.optimize() |
✓ |
Optimized plan |
.collect() |
✗ |
Materialize |
.to_pylist() |
✗ |
→ List[dict] |
.to_csv(path) |
streaming |
Write CSV |
.to_jsonl(path) |
streaming |
Write JSONL |
.explain() |
✓ |
Print plan |
.schema |
✓ |
Schema (no data) |
.typed(T) |
✓ |
→ TypedLazyFrame[T] |
.validate(T) |
✓ |
Check schema |
Accessor methods
| Accessor |
Methods |
.str |
upper, lower, strip, title, len, contains, startswith, endswith, replace, slice |
.dt |
year, month, day, hour, minute, second, microsecond, weekday, isoweekday, quarter, week, day_of_year, day_name, month_name, date, time, truncate, strftime, epoch_seconds, add_days, add_hours, add_minutes, add_seconds |