Flowfile Core API Reference
This section provides a detailed API reference for the core Python objects, data models, and API routes in flowfile-core
. The documentation is generated directly from the source code docstrings.
Core Components
This section covers the fundamental classes that manage the state and execution of data pipelines. These are the main "verbs" of the library.
FlowGraph
The FlowGraph
is the central object that orchestrates the execution of data transformations. It is built incrementally as you chain operations. This DAG (Directed Acyclic Graph) represents the entire pipeline.
flowfile_core.flowfile.flow_graph.FlowGraph
A class representing a Directed Acyclic Graph (DAG) for data processing pipelines.
It manages nodes, connections, and the execution of the entire flow.
Methods:
Name | Description |
---|---|
__init__ |
Initializes a new FlowGraph instance. |
__repr__ |
Provides the official string representation of the FlowGraph instance. |
add_cloud_storage_reader |
Adds a cloud storage read node to the flow graph. |
add_cloud_storage_writer |
Adds a node to write data to a cloud storage provider. |
add_cross_join |
Adds a cross join node to the graph. |
add_database_reader |
Adds a node to read data from a database. |
add_database_writer |
Adds a node to write data to a database. |
add_datasource |
Adds a data source node to the graph. |
add_dependency_on_polars_lazy_frame |
Adds a special node that directly injects a Polars LazyFrame into the graph. |
add_explore_data |
Adds a specialized node for data exploration and visualization. |
add_external_source |
Adds a node for a custom external data source. |
add_filter |
Adds a filter node to the graph. |
add_formula |
Adds a node that applies a formula to create or modify a column. |
add_fuzzy_match |
Adds a fuzzy matching node to join data on approximate string matches. |
add_graph_solver |
Adds a node that solves graph-like problems within the data. |
add_group_by |
Adds a group-by aggregation node to the graph. |
add_include_cols |
Adds columns to both the input and output column lists. |
add_initial_node_analysis |
Adds a data exploration/analysis node based on a node promise. |
add_join |
Adds a join node to combine two data streams based on key columns. |
add_manual_input |
Adds a node for manual data entry. |
add_node_promise |
Adds a placeholder node to the graph that is not yet fully configured. |
add_node_step |
The core method for adding or updating a node in the graph. |
add_output |
Adds an output node to write the final data to a destination. |
add_pivot |
Adds a pivot node to the graph. |
add_polars_code |
Adds a node that executes custom Polars code. |
add_read |
Adds a node to read data from a local file (e.g., CSV, Parquet, Excel). |
add_record_count |
Adds a filter node to the graph. |
add_record_id |
Adds a node to create a new column with a unique ID for each record. |
add_sample |
Adds a node to take a random or top-N sample of the data. |
add_select |
Adds a node to select, rename, reorder, or drop columns. |
add_sort |
Adds a node to sort the data based on one or more columns. |
add_sql_source |
Adds a node that reads data from a SQL source. |
add_text_to_rows |
Adds a node that splits cell values into multiple rows. |
add_union |
Adds a union node to combine multiple data streams. |
add_unique |
Adds a node to find and remove duplicate rows. |
add_unpivot |
Adds an unpivot node to the graph. |
apply_layout |
Calculates and applies a layered layout to all nodes in the graph. |
cancel |
Cancels an ongoing graph execution. |
close_flow |
Performs cleanup operations, such as clearing node caches. |
copy_node |
Creates a copy of an existing node. |
delete_node |
Deletes a node from the graph and updates all its connections. |
generate_code |
Generates code for the flow graph. |
get_frontend_data |
Formats the graph structure into a JSON-like dictionary for a specific legacy frontend. |
get_implicit_starter_nodes |
Finds nodes that can act as starting points but are not explicitly defined as such. |
get_node |
Retrieves a node from the graph by its ID. |
get_node_data |
Retrieves all data needed to render a node in the UI. |
get_node_storage |
Serializes the entire graph's state into a storable format. |
get_nodes_overview |
Gets a list of dictionary representations for all nodes in the graph. |
get_run_info |
Gets a summary of the most recent graph execution. |
get_vue_flow_input |
Formats the graph's nodes and edges into a schema suitable for the VueFlow frontend. |
print_tree |
Print flow_graph as a tree. |
remove_from_output_cols |
Removes specified columns from the list of expected output columns. |
reset |
Forces a deep reset on all nodes in the graph. |
run_graph |
Executes the entire data flow graph from start to finish. |
save_flow |
Saves the current state of the flow graph to a file. |
Attributes:
Name | Type | Description |
---|---|---|
execution_location |
ExecutionLocationsLiteral
|
Gets the current execution location. |
execution_mode |
ExecutionModeLiteral
|
Gets the current execution mode ('Development' or 'Performance'). |
flow_id |
int
|
Gets the unique identifier of the flow. |
graph_has_functions |
bool
|
Checks if the graph has any nodes. |
graph_has_input_data |
bool
|
Checks if the graph has an initial input data source. |
node_connections |
List[Tuple[int, int]]
|
Computes and returns a list of all connections in the graph. |
nodes |
List[FlowNode]
|
Gets a list of all FlowNode objects in the graph. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 |
|
execution_location
property
writable
Gets the current execution location.
execution_mode
property
writable
Gets the current execution mode ('Development' or 'Performance').
flow_id
property
writable
Gets the unique identifier of the flow.
graph_has_functions
property
Checks if the graph has any nodes.
graph_has_input_data
property
Checks if the graph has an initial input data source.
node_connections
property
Computes and returns a list of all connections in the graph.
Returns:
Type | Description |
---|---|
List[Tuple[int, int]]
|
A list of tuples, where each tuple is a (source_id, target_id) pair. |
nodes
property
Gets a list of all FlowNode objects in the graph.
__init__(flow_settings, name=None, input_cols=None, output_cols=None, path_ref=None, input_flow=None, cache_results=False)
Initializes a new FlowGraph instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_settings
|
FlowSettings | FlowGraphConfig
|
The configuration settings for the flow. |
required |
name
|
str
|
The name of the flow. |
None
|
input_cols
|
List[str]
|
A list of input column names. |
None
|
output_cols
|
List[str]
|
A list of output column names. |
None
|
path_ref
|
str
|
An optional path to an initial data source. |
None
|
input_flow
|
Union[ParquetFile, FlowDataEngine, FlowGraph]
|
An optional existing data object to start the flow with. |
None
|
cache_results
|
bool
|
A global flag to enable or disable result caching. |
False
|
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
__repr__()
Provides the official string representation of the FlowGraph instance.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
367 368 369 370 |
|
add_cloud_storage_reader(node_cloud_storage_reader)
Adds a cloud storage read node to the flow graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_cloud_storage_reader
|
NodeCloudStorageReader
|
The settings for the cloud storage read node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 |
|
add_cloud_storage_writer(node_cloud_storage_writer)
Adds a node to write data to a cloud storage provider.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_cloud_storage_writer
|
NodeCloudStorageWriter
|
The settings for the cloud storage writer node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 |
|
add_cross_join(cross_join_settings)
Adds a cross join node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cross_join_settings
|
NodeCrossJoin
|
The settings for the cross join operation. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 |
|
add_database_reader(node_database_reader)
Adds a node to read data from a database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_database_reader
|
NodeDatabaseReader
|
The settings for the database reader node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 |
|
add_database_writer(node_database_writer)
Adds a node to write data to a database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_database_writer
|
NodeDatabaseWriter
|
The settings for the database writer node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 |
|
add_datasource(input_file)
Adds a data source node to the graph.
This method serves as a factory for creating starting nodes, handling both file-based sources and direct manual data entry.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
Union[NodeDatasource, NodeManualInput]
|
The configuration object for the data source. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 |
|
add_dependency_on_polars_lazy_frame(lazy_frame, node_id)
Adds a special node that directly injects a Polars LazyFrame into the graph.
Note: This is intended for backend use and will not work in the UI editor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lazy_frame
|
LazyFrame
|
The Polars LazyFrame to inject. |
required |
node_id
|
int
|
The ID for the new node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 |
|
add_explore_data(node_analysis)
Adds a specialized node for data exploration and visualization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_analysis
|
NodeExploreData
|
The settings for the data exploration node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 |
|
add_external_source(external_source_input)
Adds a node for a custom external data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
external_source_input
|
NodeExternalSource
|
The settings for the external source node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 |
|
add_filter(filter_settings)
Adds a filter node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter_settings
|
NodeFilter
|
The settings for the filter operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 |
|
add_formula(function_settings)
Adds a node that applies a formula to create or modify a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function_settings
|
NodeFormula
|
The settings for the formula operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 |
|
add_fuzzy_match(fuzzy_settings)
Adds a fuzzy matching node to join data on approximate string matches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuzzy_settings
|
NodeFuzzyMatch
|
The settings for the fuzzy match operation. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 |
|
add_graph_solver(graph_solver_settings)
Adds a node that solves graph-like problems within the data.
This node can be used for operations like finding network paths, calculating connected components, or performing other graph algorithms on relational data that represents nodes and edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_solver_settings
|
NodeGraphSolver
|
The settings object defining the graph inputs and the specific algorithm to apply. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 |
|
add_group_by(group_by_settings)
Adds a group-by aggregation node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_by_settings
|
NodeGroupBy
|
The settings for the group-by operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 |
|
add_include_cols(include_columns)
Adds columns to both the input and output column lists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_columns
|
List[str]
|
A list of column names to include. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 |
|
add_initial_node_analysis(node_promise)
Adds a data exploration/analysis node based on a node promise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_promise
|
NodePromise
|
The promise representing the node to be analyzed. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
461 462 463 464 465 466 467 468 |
|
add_join(join_settings)
Adds a join node to combine two data streams based on key columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
join_settings
|
NodeJoin
|
The settings for the join operation. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 |
|
add_manual_input(input_file)
Adds a node for manual data entry.
This is a convenience alias for add_datasource
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
NodeManualInput
|
The settings and data for the manual input node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1495 1496 1497 1498 1499 1500 1501 1502 1503 |
|
add_node_promise(node_promise)
Adds a placeholder node to the graph that is not yet fully configured.
Useful for building the graph structure before all settings are available.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_promise
|
NodePromise
|
A promise object containing basic node information. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
add_node_step(node_id, function, input_columns=None, output_schema=None, node_type=None, drop_columns=None, renew_schema=True, setting_input=None, cache_results=None, schema_callback=None, input_node_ids=None)
The core method for adding or updating a node in the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
Union[int, str]
|
The unique ID for the node. |
required |
function
|
Callable
|
The core processing function for the node. |
required |
input_columns
|
List[str]
|
A list of input column names required by the function. |
None
|
output_schema
|
List[FlowfileColumn]
|
A predefined schema for the node's output. |
None
|
node_type
|
str
|
A string identifying the type of node (e.g., 'filter', 'join'). |
None
|
drop_columns
|
List[str]
|
A list of columns to be dropped after the function executes. |
None
|
renew_schema
|
bool
|
If True, the schema is recalculated after execution. |
True
|
setting_input
|
Any
|
A configuration object containing settings for the node. |
None
|
cache_results
|
bool
|
If True, the node's results are cached for future runs. |
None
|
schema_callback
|
Callable
|
A function that dynamically calculates the output schema. |
None
|
input_node_ids
|
List[int]
|
A list of IDs for the nodes that this node depends on. |
None
|
Returns:
Type | Description |
---|---|
FlowNode
|
The created or updated FlowNode object. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 |
|
add_output(output_file)
Adds an output node to write the final data to a destination.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_file
|
NodeOutput
|
The settings for the output file. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 |
|
add_pivot(pivot_settings)
Adds a pivot node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pivot_settings
|
NodePivot
|
The settings for the pivot operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 |
|
add_polars_code(node_polars_code)
Adds a node that executes custom Polars code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_polars_code
|
NodePolarsCode
|
The settings for the Polars code node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 |
|
add_read(input_file)
Adds a node to read data from a local file (e.g., CSV, Parquet, Excel).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file
|
NodeRead
|
The settings for the read operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 |
|
add_record_count(node_number_of_records)
Adds a filter node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_number_of_records
|
NodeRecordCount
|
The settings for the record count operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 |
|
add_record_id(record_id_settings)
Adds a node to create a new column with a unique ID for each record.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_id_settings
|
NodeRecordId
|
The settings object specifying the name of the new record ID column. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 |
|
add_sample(sample_settings)
Adds a node to take a random or top-N sample of the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_settings
|
NodeSample
|
The settings object specifying the size of the sample. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 |
|
add_select(select_settings)
Adds a node to select, rename, reorder, or drop columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
select_settings
|
NodeSelect
|
The settings for the select operation. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 |
|
add_sort(sort_settings)
Adds a node to sort the data based on one or more columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sort_settings
|
NodeSort
|
The settings for the sort operation. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 |
|
add_sql_source(external_source_input)
Adds a node that reads data from a SQL source.
This is a convenience alias for add_external_source
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
external_source_input
|
NodeExternalSource
|
The settings for the external SQL source node. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 |
|
add_text_to_rows(node_text_to_rows)
Adds a node that splits cell values into multiple rows.
This is useful for un-nesting data where a single field contains multiple values separated by a delimiter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_text_to_rows
|
NodeTextToRows
|
The settings object that specifies the column to split and the delimiter to use. |
required |
Returns:
Type | Description |
---|---|
FlowGraph
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 |
|
add_union(union_settings)
Adds a union node to combine multiple data streams.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
union_settings
|
NodeUnion
|
The settings for the union operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 |
|
add_unique(unique_settings)
Adds a node to find and remove duplicate rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unique_settings
|
NodeUnique
|
The settings for the unique operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 |
|
add_unpivot(unpivot_settings)
Adds an unpivot node to the graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unpivot_settings
|
NodeUnpivot
|
The settings for the unpivot operation. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 |
|
apply_layout(y_spacing=150, x_spacing=200, initial_y=100)
Calculates and applies a layered layout to all nodes in the graph.
This updates their x and y positions for UI rendering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_spacing
|
int
|
The vertical spacing between layers. |
150
|
x_spacing
|
int
|
The horizontal spacing between nodes in the same layer. |
200
|
initial_y
|
int
|
The initial y-position for the first layer. |
100
|
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
|
cancel()
Cancels an ongoing graph execution.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1709 1710 1711 1712 1713 1714 1715 1716 |
|
close_flow()
Performs cleanup operations, such as clearing node caches.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1718 1719 1720 1721 1722 |
|
copy_node(new_node_settings, existing_setting_input, node_type)
Creates a copy of an existing node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_node_settings
|
NodePromise
|
The promise containing new settings (like ID and position). |
required |
existing_setting_input
|
Any
|
The settings object from the node being copied. |
required |
node_type
|
str
|
The type of the node being copied. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 |
|
delete_node(node_id)
Deletes a node from the graph and updates all its connections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
Union[int, str]
|
The ID of the node to delete. |
required |
Raises:
Type | Description |
---|---|
Exception
|
If the node with the given ID does not exist. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 |
|
generate_code()
Generates code for the flow graph. This method exports the flow graph to a Polars-compatible format.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1849 1850 1851 1852 1853 1854 |
|
get_frontend_data()
Formats the graph structure into a JSON-like dictionary for a specific legacy frontend.
This method transforms the graph's state into a format compatible with the Drawflow.js library.
Returns:
Type | Description |
---|---|
dict
|
A dictionary representing the graph in Drawflow format. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 |
|
get_implicit_starter_nodes()
Finds nodes that can act as starting points but are not explicitly defined as such.
Some nodes, like the Polars Code node, can function without an input. This method identifies such nodes if they have no incoming connections.
Returns:
Type | Description |
---|---|
List[FlowNode]
|
A list of |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 |
|
get_node(node_id=None)
Retrieves a node from the graph by its ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
Union[int, str]
|
The ID of the node to retrieve. If None, retrieves the last added node. |
None
|
Returns:
Type | Description |
---|---|
FlowNode | None
|
The FlowNode object, or None if not found. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
388 389 390 391 392 393 394 395 396 397 398 399 400 401 |
|
get_node_data(node_id, include_example=True)
Retrieves all data needed to render a node in the UI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
int
|
The ID of the node. |
required |
include_example
|
bool
|
Whether to include data samples in the result. |
True
|
Returns:
Type | Description |
---|---|
NodeData
|
A NodeData object, or None if the node is not found. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 |
|
get_node_storage()
Serializes the entire graph's state into a storable format.
Returns:
Type | Description |
---|---|
FlowInformation
|
A FlowInformation object representing the complete graph. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 |
|
get_nodes_overview()
Gets a list of dictionary representations for all nodes in the graph.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
372 373 374 375 376 377 |
|
get_run_info()
Gets a summary of the most recent graph execution.
Returns:
Type | Description |
---|---|
RunInformation
|
A RunInformation object with details about the last run. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 |
|
get_vue_flow_input()
Formats the graph's nodes and edges into a schema suitable for the VueFlow frontend.
Returns:
Type | Description |
---|---|
VueFlowInput
|
A VueFlowInput object. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 |
|
print_tree(show_schema=False, show_descriptions=False)
Print flow_graph as a tree.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
|
remove_from_output_cols(columns)
Removes specified columns from the list of expected output columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
List[str]
|
A list of column names to remove. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
379 380 381 382 383 384 385 386 |
|
reset()
Forces a deep reset on all nodes in the graph.
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1825 1826 1827 1828 1829 |
|
run_graph()
Executes the entire data flow graph from start to finish.
It determines the correct execution order, runs each node, collects results, and handles errors and cancellations.
Returns:
Type | Description |
---|---|
RunInformation | None
|
A RunInformation object summarizing the execution results. |
Raises:
Type | Description |
---|---|
Exception
|
If the flow is already running. |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 |
|
save_flow(flow_path)
Saves the current state of the flow graph to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_path
|
str
|
The path where the flow file will be saved. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_graph.py
1724 1725 1726 1727 1728 1729 1730 1731 1732 |
|
FlowNode
The FlowNode
represents a single operation in the FlowGraph
. Each node corresponds to a specific transformation or action, such as filtering or grouping data.
flowfile_core.flowfile.flow_node.flow_node.FlowNode
Represents a single node in a data flow graph.
This class manages the node's state, its data processing function, and its connections to other nodes within the graph.
Methods:
Name | Description |
---|---|
__call__ |
Makes the node instance callable, acting as an alias for execute_node. |
__init__ |
Initializes a FlowNode instance. |
__repr__ |
Provides a string representation of the FlowNode instance. |
add_lead_to_in_depend_source |
Ensures this node is registered in the |
add_node_connection |
Adds a connection from a source node to this node. |
calculate_hash |
Calculates a hash based on settings and input node hashes. |
cancel |
Cancels an ongoing external process if one is running. |
create_schema_callback_from_function |
Wraps a node's function to create a schema callback that extracts the schema. |
delete_input_node |
Removes a connection from a specific input node. |
delete_lead_to_node |
Removes a connection to a specific downstream node. |
evaluate_nodes |
Triggers a state reset for all directly connected downstream nodes. |
execute_full_local |
Executes the node's logic locally, including example data generation. |
execute_local |
Executes the node's logic locally. |
execute_node |
Orchestrates the execution, handling location, caching, and retries. |
execute_remote |
Executes the node's logic remotely or handles cached results. |
get_all_dependent_node_ids |
Yields the IDs of all downstream nodes recursively. |
get_all_dependent_nodes |
Yields all downstream nodes recursively. |
get_edge_input |
Generates |
get_flow_file_column_schema |
Retrieves the schema for a specific column from the output schema. |
get_input_type |
Gets the type of connection ('main', 'left', 'right') for a given input node ID. |
get_node_data |
Gathers all necessary data for representing the node in the UI. |
get_node_information |
Updates and returns the node's information object. |
get_node_input |
Creates a |
get_output_data |
Gets the full output data sample for this node. |
get_predicted_resulting_data |
Creates a |
get_predicted_schema |
Predicts the output schema of the node without full execution. |
get_repr |
Gets a detailed dictionary representation of the node's state. |
get_resulting_data |
Executes the node's function to produce the actual output data. |
get_table_example |
Generates a |
needs_reset |
Checks if the node's hash has changed, indicating an outdated state. |
needs_run |
Determines if the node needs to be executed. |
post_init |
Initializes or resets the node's attributes to their default states. |
prepare_before_run |
Resets results and errors before a new execution. |
print |
Helper method to log messages with node context. |
remove_cache |
Removes cached results for this node. |
reset |
Resets the node's execution state and schema information. |
set_node_information |
Populates the |
store_example_data_generator |
Stores a generator function for fetching a sample of the result data. |
update_node |
Updates the properties of the node. |
Attributes:
Name | Type | Description |
---|---|---|
all_inputs |
List[FlowNode]
|
Gets a list of all nodes connected to any input port. |
function |
Callable
|
Gets the core processing function of the node. |
has_input |
bool
|
Checks if this node has any input connections. |
has_next_step |
bool
|
Checks if this node has any downstream connections. |
hash |
str
|
Gets the cached hash for the node, calculating it if it doesn't exist. |
is_correct |
bool
|
Checks if the node's input connections satisfy its template requirements. |
is_setup |
bool
|
Checks if the node has been properly configured and is ready for execution. |
is_start |
bool
|
Determines if the node is a starting node in the flow. |
left_input |
Optional[FlowNode]
|
Gets the node connected to the left input port. |
main_input |
List[FlowNode]
|
Gets the list of nodes connected to the main input port(s). |
name |
str
|
Gets the name of the node. |
node_id |
Union[str, int]
|
Gets the unique identifier of the node. |
number_of_leads_to_nodes |
int | None
|
Counts the number of downstream node connections. |
right_input |
Optional[FlowNode]
|
Gets the node connected to the right input port. |
schema |
List[FlowfileColumn]
|
Gets the definitive output schema of the node. |
schema_callback |
SingleExecutionFuture
|
Gets the schema callback function, creating one if it doesn't exist. |
setting_input |
Any
|
Gets the node's specific configuration settings. |
singular_input |
bool
|
Checks if the node template specifies exactly one input. |
singular_main_input |
FlowNode
|
Gets the input node, assuming it is a single-input type. |
state_needs_reset |
bool
|
Checks if the node's state needs to be reset. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 |
|
all_inputs
property
Gets a list of all nodes connected to any input port.
Returns:
Type | Description |
---|---|
List[FlowNode]
|
A list of all input FlowNodes. |
function
property
writable
Gets the core processing function of the node.
Returns:
Type | Description |
---|---|
Callable
|
The callable function. |
has_input
property
Checks if this node has any input connections.
Returns:
Type | Description |
---|---|
bool
|
True if it has at least one input node. |
has_next_step
property
Checks if this node has any downstream connections.
Returns:
Type | Description |
---|---|
bool
|
True if it has at least one downstream node. |
hash
property
Gets the cached hash for the node, calculating it if it doesn't exist.
Returns:
Type | Description |
---|---|
str
|
The string hash value. |
is_correct
property
Checks if the node's input connections satisfy its template requirements.
Returns:
Type | Description |
---|---|
bool
|
True if connections are valid, False otherwise. |
is_setup
property
Checks if the node has been properly configured and is ready for execution.
Returns:
Type | Description |
---|---|
bool
|
True if the node is set up, False otherwise. |
is_start
property
Determines if the node is a starting node in the flow.
A starting node requires no inputs.
Returns:
Type | Description |
---|---|
bool
|
True if the node is a start node, False otherwise. |
left_input
property
Gets the node connected to the left input port.
Returns:
Type | Description |
---|---|
Optional[FlowNode]
|
The left input FlowNode, or None. |
main_input
property
Gets the list of nodes connected to the main input port(s).
Returns:
Type | Description |
---|---|
List[FlowNode]
|
A list of main input FlowNodes. |
name
property
writable
Gets the name of the node.
Returns:
Type | Description |
---|---|
str
|
The node's name. |
node_id
property
Gets the unique identifier of the node.
Returns:
Type | Description |
---|---|
Union[str, int]
|
The node's ID. |
number_of_leads_to_nodes
property
Counts the number of downstream node connections.
Returns:
Type | Description |
---|---|
int | None
|
The number of nodes this node leads to. |
right_input
property
Gets the node connected to the right input port.
Returns:
Type | Description |
---|---|
Optional[FlowNode]
|
The right input FlowNode, or None. |
schema
property
Gets the definitive output schema of the node.
If not already run, it falls back to the predicted schema.
Returns:
Type | Description |
---|---|
List[FlowfileColumn]
|
A list of FlowfileColumn objects. |
schema_callback
property
writable
Gets the schema callback function, creating one if it doesn't exist.
The callback is used for predicting the output schema without full execution.
Returns:
Type | Description |
---|---|
SingleExecutionFuture
|
A SingleExecutionFuture instance wrapping the schema function. |
setting_input
property
writable
Gets the node's specific configuration settings.
Returns:
Type | Description |
---|---|
Any
|
The settings object. |
singular_input
property
Checks if the node template specifies exactly one input.
Returns:
Type | Description |
---|---|
bool
|
True if the node is a single-input type. |
singular_main_input
property
Gets the input node, assuming it is a single-input type.
Returns:
Type | Description |
---|---|
FlowNode
|
The single input FlowNode, or None. |
state_needs_reset
property
writable
Checks if the node's state needs to be reset.
Returns:
Type | Description |
---|---|
bool
|
True if a reset is required, False otherwise. |
__call__(*args, **kwargs)
Makes the node instance callable, acting as an alias for execute_node.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
714 715 716 |
|
__init__(node_id, function, parent_uuid, setting_input, name, node_type, input_columns=None, output_schema=None, drop_columns=None, renew_schema=True, pos_x=0, pos_y=0, schema_callback=None)
Initializes a FlowNode instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
Union[str, int]
|
Unique identifier for the node. |
required |
function
|
Callable
|
The core data processing function for the node. |
required |
parent_uuid
|
str
|
The UUID of the parent flow. |
required |
setting_input
|
Any
|
The configuration/settings object for the node. |
required |
name
|
str
|
The name of the node. |
required |
node_type
|
str
|
The type identifier of the node (e.g., 'join', 'filter'). |
required |
input_columns
|
List[str]
|
List of column names expected as input. |
None
|
output_schema
|
List[FlowfileColumn]
|
The schema of the columns to be added. |
None
|
drop_columns
|
List[str]
|
List of column names to be dropped. |
None
|
renew_schema
|
bool
|
Flag to indicate if the schema should be renewed. |
True
|
pos_x
|
float
|
The x-coordinate on the canvas. |
0
|
pos_y
|
float
|
The y-coordinate on the canvas. |
0
|
schema_callback
|
Callable
|
A custom function to calculate the output schema. |
None
|
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
__repr__()
Provides a string representation of the FlowNode instance.
Returns:
Type | Description |
---|---|
str
|
A string showing the node's ID and type. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1027 1028 1029 1030 1031 1032 1033 |
|
add_lead_to_in_depend_source()
Ensures this node is registered in the leads_to_nodes
list of its inputs.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
621 622 623 624 625 |
|
add_node_connection(from_node, insert_type='main')
Adds a connection from a source node to this node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_node
|
FlowNode
|
The node to connect from. |
required |
insert_type
|
Literal['main', 'left', 'right']
|
The type of input to connect to ('main', 'left', 'right'). |
'main'
|
Raises:
Type | Description |
---|---|
Exception
|
If the insert_type is invalid. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 |
|
calculate_hash(setting_input)
Calculates a hash based on settings and input node hashes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
setting_input
|
Any
|
The node's settings object to be included in the hash. |
required |
Returns:
Type | Description |
---|---|
str
|
A string hash value. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
415 416 417 418 419 420 421 422 423 424 425 426 |
|
cancel()
Cancels an ongoing external process if one is running.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
848 849 850 851 852 853 854 855 856 |
|
create_schema_callback_from_function(f)
staticmethod
Wraps a node's function to create a schema callback that extracts the schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f
|
Callable
|
The node's core function that returns a FlowDataEngine instance. |
required |
Returns:
Type | Description |
---|---|
Callable[[], List[FlowfileColumn]]
|
A callable that, when executed, returns the output schema. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
delete_input_node(node_id, connection_type='input-0', complete=False)
Removes a connection from a specific input node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
int
|
The ID of the input node to disconnect. |
required |
connection_type
|
InputConnectionClass
|
The specific input handle (e.g., 'input-0', 'input-1'). |
'input-0'
|
complete
|
bool
|
If True, tries to delete from all input types. |
False
|
Returns:
Type | Description |
---|---|
bool
|
True if a connection was found and removed, False otherwise. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 |
|
delete_lead_to_node(node_id)
Removes a connection to a specific downstream node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
int
|
The ID of the downstream node to disconnect. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the connection was found and removed, False otherwise. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 |
|
evaluate_nodes(deep=False)
Triggers a state reset for all directly connected downstream nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
deep
|
bool
|
If True, the reset propagates recursively through the entire downstream graph. |
False
|
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
468 469 470 471 472 473 474 475 476 |
|
execute_full_local(performance_mode=False)
Executes the node's logic locally, including example data generation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
performance_mode
|
bool
|
If True, skips generating example data. |
False
|
Raises:
Type | Description |
---|---|
Exception
|
Propagates exceptions from the execution. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 |
|
execute_local(flow_id, performance_mode=False)
Executes the node's logic locally.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_id
|
int
|
The ID of the parent flow. |
required |
performance_mode
|
bool
|
If True, skips generating example data. |
False
|
Raises:
Type | Description |
---|---|
Exception
|
Propagates exceptions from the execution. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 |
|
execute_node(run_location, reset_cache=False, performance_mode=False, retry=True, node_logger=None)
Orchestrates the execution, handling location, caching, and retries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
run_location
|
ExecutionLocationsLiteral
|
The location for execution ('local', 'remote'). |
required |
reset_cache
|
bool
|
If True, forces removal of any existing cache. |
False
|
performance_mode
|
bool
|
If True, optimizes for speed over diagnostics. |
False
|
retry
|
bool
|
If True, allows retrying execution on recoverable errors. |
True
|
node_logger
|
NodeLogger
|
The logger for this node execution. |
None
|
Raises:
Type | Description |
---|---|
Exception
|
If the node_logger is not defined. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 |
|
execute_remote(performance_mode=False, node_logger=None)
Executes the node's logic remotely or handles cached results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
performance_mode
|
bool
|
If True, skips generating example data. |
False
|
node_logger
|
NodeLogger
|
The logger for this node execution. |
None
|
Raises:
Type | Description |
---|---|
Exception
|
If the node_logger is not provided or if execution fails. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 |
|
get_all_dependent_node_ids()
Yields the IDs of all downstream nodes recursively.
Returns:
Type | Description |
---|---|
None
|
A generator of all dependent node IDs. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
638 639 640 641 642 643 644 645 646 647 |
|
get_all_dependent_nodes()
Yields all downstream nodes recursively.
Returns:
Type | Description |
---|---|
None
|
A generator of all dependent FlowNode objects. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
627 628 629 630 631 632 633 634 635 636 |
|
get_edge_input()
Generates NodeEdge
objects for all input connections to this node.
Returns:
Type | Description |
---|---|
List[NodeEdge]
|
A list of |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 |
|
get_flow_file_column_schema(col_name)
Retrieves the schema for a specific column from the output schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name
|
str
|
The name of the column. |
required |
Returns:
Type | Description |
---|---|
FlowfileColumn | None
|
The FlowfileColumn object for that column, or None if not found. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
478 479 480 481 482 483 484 485 486 487 488 489 |
|
get_input_type(node_id)
Gets the type of connection ('main', 'left', 'right') for a given input node ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id
|
int
|
The ID of the input node. |
required |
Returns:
Type | Description |
---|---|
List
|
A list of connection types for that node ID. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
get_node_data(flow_id, include_example=False)
Gathers all necessary data for representing the node in the UI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_id
|
int
|
The ID of the parent flow. |
required |
include_example
|
bool
|
If True, includes data samples. |
False
|
Returns:
Type | Description |
---|---|
NodeData
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 |
|
get_node_information()
Updates and returns the node's information object.
Returns:
Type | Description |
---|---|
NodeInformation
|
The |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
379 380 381 382 383 384 385 386 |
|
get_node_input()
Creates a NodeInput
schema object for representing this node in the UI.
Returns:
Type | Description |
---|---|
NodeInput
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 |
|
get_output_data()
Gets the full output data sample for this node.
Returns:
Type | Description |
---|---|
TableExample
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1180 1181 1182 1183 1184 1185 1186 |
|
get_predicted_resulting_data()
Creates a FlowDataEngine
instance based on the predicted schema.
This avoids executing the node's full logic.
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A FlowDataEngine instance with a schema but no data. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 |
|
get_predicted_schema(force=False)
Predicts the output schema of the node without full execution.
It uses the schema_callback or infers from predicted data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
force
|
bool
|
If True, forces recalculation even if a predicted schema exists. |
False
|
Returns:
Type | Description |
---|---|
List[FlowfileColumn] | None
|
A list of FlowfileColumn objects representing the predicted schema. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 |
|
get_repr()
Gets a detailed dictionary representation of the node's state.
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing key information about the node. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 |
|
get_resulting_data()
Executes the node's function to produce the actual output data.
Handles both regular functions and external data sources.
Returns:
Type | Description |
---|---|
FlowDataEngine | None
|
A FlowDataEngine instance containing the result, or None on error. |
Raises:
Type | Description |
---|---|
Exception
|
Propagates exceptions from the node's function execution. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 |
|
get_table_example(include_data=False)
Generates a TableExample
model summarizing the node's output.
This can optionally include a sample of the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_data
|
bool
|
If True, includes a data sample in the result. |
False
|
Returns:
Type | Description |
---|---|
TableExample | None
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 |
|
needs_reset()
Checks if the node's hash has changed, indicating an outdated state.
Returns:
Type | Description |
---|---|
bool
|
True if the calculated hash differs from the stored hash. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
941 942 943 944 945 946 947 |
|
needs_run(performance_mode, node_logger=None, execution_location='auto')
Determines if the node needs to be executed.
The decision is based on its run state, caching settings, and execution mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
performance_mode
|
bool
|
True if the flow is in performance mode. |
required |
node_logger
|
NodeLogger
|
The logger instance for this node. |
None
|
execution_location
|
ExecutionLocationsLiteral
|
The target execution location. |
'auto'
|
Returns:
Type | Description |
---|---|
bool
|
True if the node should be run, False otherwise. |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 |
|
post_init()
Initializes or resets the node's attributes to their default states.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
prepare_before_run()
Resets results and errors before a new execution.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
841 842 843 844 845 846 |
|
print(v)
Helper method to log messages with node context.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
v
|
Any
|
The message or value to log. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
533 534 535 536 537 538 539 |
|
remove_cache()
Removes cached results for this node.
Note: Currently not fully implemented.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
674 675 676 677 678 679 680 681 |
|
reset(deep=False)
Resets the node's execution state and schema information.
This also triggers a reset on all downstream nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
deep
|
bool
|
If True, forces a reset even if the hash hasn't changed. |
False
|
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 |
|
set_node_information()
Populates the node_information
attribute with the current state.
This includes the node's connections, settings, and position.
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
store_example_data_generator(external_df_fetcher)
Stores a generator function for fetching a sample of the result data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
external_df_fetcher
|
ExternalDfFetcher | ExternalSampler
|
The process that generated the sample data. |
required |
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
928 929 930 931 932 933 934 935 936 937 938 939 |
|
update_node(function, input_columns=None, output_schema=None, drop_columns=None, name=None, setting_input=None, pos_x=0, pos_y=0, schema_callback=None)
Updates the properties of the node.
This is called during initialization and when settings are changed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function
|
Callable
|
The new core data processing function. |
required |
input_columns
|
List[str]
|
The new list of input columns. |
None
|
output_schema
|
List[FlowfileColumn]
|
The new schema of added columns. |
None
|
drop_columns
|
List[str]
|
The new list of dropped columns. |
None
|
name
|
str
|
The new name for the node. |
None
|
setting_input
|
Any
|
The new settings object. |
None
|
pos_x
|
float
|
The new x-coordinate. |
0
|
pos_y
|
float
|
The new y-coordinate. |
0
|
schema_callback
|
Callable
|
The new custom schema callback function. |
None
|
Source code in flowfile_core/flowfile_core/flowfile/flow_node/flow_node.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
The FlowDataEngine
The FlowDataEngine
is the primary engine of the library, providing a rich API for data manipulation, I/O, and transformation. Its methods are grouped below by functionality.
flowfile_core.flowfile.flow_data_engine.flow_data_engine.FlowDataEngine
dataclass
The core data handling engine for Flowfile.
This class acts as a high-level wrapper around a Polars DataFrame or LazyFrame, providing a unified API for data ingestion, transformation, and output. It manages data state (lazy vs. eager), schema information, and execution logic.
Attributes:
Name | Type | Description |
---|---|---|
_data_frame |
Union[DataFrame, LazyFrame]
|
The underlying Polars DataFrame or LazyFrame. |
columns |
List[Any]
|
A list of column names in the current data frame. |
name |
str
|
An optional name for the data engine instance. |
number_of_records |
int
|
The number of records. Can be -1 for lazy frames. |
errors |
List
|
A list of errors encountered during operations. |
_schema |
Optional[List[FlowfileColumn]]
|
A cached list of |
Methods:
Name | Description |
---|---|
__call__ |
Makes the class instance callable, returning itself. |
__get_sample__ |
Internal method to get a sample of the data. |
__getitem__ |
Accesses a specific column or item from the DataFrame. |
__init__ |
Initializes the FlowDataEngine from various data sources. |
__len__ |
Returns the number of records in the table. |
__repr__ |
Returns a string representation of the FlowDataEngine. |
add_new_values |
Adds a new column with the provided values. |
add_record_id |
Adds a record ID (row number) column to the DataFrame. |
apply_flowfile_formula |
Applies a formula to create a new column or transform an existing one. |
apply_sql_formula |
Applies an SQL-style formula using |
assert_equal |
Asserts that this DataFrame is equal to another. |
cache |
Caches the current DataFrame to disk and updates the internal reference. |
calculate_schema |
Calculates and returns the schema. |
change_column_types |
Changes the data type of one or more columns. |
collect |
Collects the data and returns it as a Polars DataFrame. |
collect_external |
Materializes data from a tracked external source. |
concat |
Concatenates this DataFrame with one or more other DataFrames. |
count |
Gets the total number of records. |
create_from_external_source |
Creates a FlowDataEngine from an external data source. |
create_from_path |
Creates a FlowDataEngine from a local file path. |
create_from_path_worker |
Creates a FlowDataEngine from a path in a worker process. |
create_from_schema |
Creates an empty FlowDataEngine from a schema definition. |
create_from_sql |
Creates a FlowDataEngine by executing a SQL query. |
create_random |
Creates a FlowDataEngine with randomly generated data. |
do_cross_join |
Performs a cross join with another DataFrame. |
do_filter |
Filters rows based on a predicate expression. |
do_fuzzy_join |
Performs a fuzzy join with another DataFrame. |
do_group_by |
Performs a group-by operation on the DataFrame. |
do_pivot |
Converts the DataFrame from a long to a wide format, aggregating values. |
do_select |
Performs a complex column selection, renaming, and reordering operation. |
do_sort |
Sorts the DataFrame by one or more columns. |
drop_columns |
Drops specified columns from the DataFrame. |
from_cloud_storage_obj |
Creates a FlowDataEngine from an object in cloud storage. |
fuzzy_match |
Performs a simple fuzzy match between two DataFrames on a single column pair. |
generate_enumerator |
Generates a FlowDataEngine with a single column containing a sequence of integers. |
get_estimated_file_size |
Estimates the file size in bytes if the data originated from a local file. |
get_number_of_records |
Gets the total number of records in the DataFrame. |
get_output_sample |
Gets a sample of the data as a list of dictionaries. |
get_record_count |
Returns a new FlowDataEngine with a single column 'number_of_records' |
get_sample |
Gets a sample of rows from the DataFrame. |
get_schema_column |
Retrieves the schema information for a single column by its name. |
get_select_inputs |
Gets |
get_subset |
Gets the first |
initialize_empty_fl |
Initializes an empty LazyFrame. |
iter_batches |
Iterates over the DataFrame in batches. |
join |
Performs a standard SQL-style join with another DataFrame. |
make_unique |
Gets the unique rows from the DataFrame. |
output |
Writes the DataFrame to an output file. |
reorganize_order |
Reorganizes columns into a specified order. |
save |
Saves the DataFrame to a file in a separate thread. |
select_columns |
Selects a subset of columns from the DataFrame. |
set_streamable |
Sets whether DataFrame operations should be streamable. |
solve_graph |
Solves a graph problem represented by 'from' and 'to' columns. |
split |
Splits a column's text values into multiple rows based on a delimiter. |
start_fuzzy_join |
Starts a fuzzy join operation in a background process. |
to_arrow |
Converts the DataFrame to a PyArrow Table. |
to_cloud_storage_obj |
Writes the DataFrame to an object in cloud storage. |
to_dict |
Converts the DataFrame to a Python dictionary of columns. |
to_pylist |
Converts the DataFrame to a list of Python dictionaries. |
to_raw_data |
Converts the DataFrame to a |
unpivot |
Converts the DataFrame from a wide to a long format. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 |
|
__name__
property
The name of the table.
cols_idx
property
A dictionary mapping column names to their integer index.
data_frame
property
writable
The underlying Polars DataFrame or LazyFrame.
This property provides access to the Polars object that backs the FlowDataEngine. It handles lazy-loading from external sources if necessary.
Returns:
Type | Description |
---|---|
LazyFrame | DataFrame | None
|
The active Polars |
external_source
property
The external data source, if any.
has_errors
property
Checks if there are any errors.
lazy
property
writable
Indicates if the DataFrame is in lazy mode.
number_of_fields
property
The number of columns (fields) in the DataFrame.
Returns:
Type | Description |
---|---|
int
|
The integer count of columns. |
schema
property
The schema of the DataFrame as a list of FlowfileColumn
objects.
This property lazily calculates the schema if it hasn't been determined yet.
Returns:
Type | Description |
---|---|
List[FlowfileColumn]
|
A list of |
__call__()
Makes the class instance callable, returning itself.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1485 1486 1487 |
|
__get_sample__(n_rows=100, streamable=True)
Internal method to get a sample of the data.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 |
|
__getitem__(item)
Accesses a specific column or item from the DataFrame.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
768 769 770 |
|
__init__(raw_data=None, path_ref=None, name=None, optimize_memory=True, schema=None, number_of_records=None, calculate_schema_stats=False, streamable=True, number_of_records_callback=None, data_callback=None)
Initializes the FlowDataEngine from various data sources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw_data
|
Union[List[Dict], List[Any], Dict[str, Any], ParquetFile, DataFrame, LazyFrame, RawData]
|
The input data. Can be a list of dicts, a Polars DataFrame/LazyFrame,
or a |
None
|
path_ref
|
str
|
A string path to a Parquet file. |
None
|
name
|
str
|
An optional name for the data engine instance. |
None
|
optimize_memory
|
bool
|
If True, prefers lazy operations to conserve memory. |
True
|
schema
|
List[FlowfileColumn] | List[str] | Schema
|
An optional schema definition. Can be a list of |
None
|
number_of_records
|
int
|
The number of records, if known. |
None
|
calculate_schema_stats
|
bool
|
If True, computes detailed statistics for each column. |
False
|
streamable
|
bool
|
If True, allows for streaming operations when possible. |
True
|
number_of_records_callback
|
Callable
|
A callback function to retrieve the number of records. |
None
|
data_callback
|
Callable
|
A callback function to retrieve the data. |
None
|
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
__len__()
Returns the number of records in the table.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1489 1490 1491 |
|
__repr__()
Returns a string representation of the FlowDataEngine.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1481 1482 1483 |
|
add_new_values(values, col_name=None)
Adds a new column with the provided values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
values
|
Iterable
|
An iterable (e.g., list, tuple) of values to add as a new column. |
required |
col_name
|
str
|
The name for the new column. Defaults to 'new_values'. |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 |
|
add_record_id(record_id_settings)
Adds a record ID (row number) column to the DataFrame.
Can generate a simple sequential ID or a grouped ID that resets for each group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_id_settings
|
RecordIdInput
|
A |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 |
|
apply_flowfile_formula(func, col_name, output_data_type=None)
Applies a formula to create a new column or transform an existing one.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
str
|
A string containing a Polars expression formula. |
required |
col_name
|
str
|
The name of the new or transformed column. |
required |
output_data_type
|
DataType
|
The desired Polars data type for the output column. |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 |
|
apply_sql_formula(func, col_name, output_data_type=None)
Applies an SQL-style formula using pl.sql_expr
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
str
|
A string containing an SQL expression. |
required |
col_name
|
str
|
The name of the new or transformed column. |
required |
output_data_type
|
DataType
|
The desired Polars data type for the output column. |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 |
|
assert_equal(other, ordered=True, strict_schema=False)
Asserts that this DataFrame is equal to another.
Useful for testing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
FlowDataEngine
|
The other |
required |
ordered
|
bool
|
If True, the row order must be identical. |
True
|
strict_schema
|
bool
|
If True, the data types of the schemas must be identical. |
False
|
Raises:
Type | Description |
---|---|
Exception
|
If the DataFrames are not equal based on the specified criteria. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 |
|
cache()
Caches the current DataFrame to disk and updates the internal reference.
This triggers a background process to write the current LazyFrame's result
to a temporary file. Subsequent operations on this FlowDataEngine
instance
will read from the cached file, which can speed up downstream computations.
Returns:
Type | Description |
---|---|
FlowDataEngine
|
The same |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 |
|
calculate_schema()
Calculates and returns the schema.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2266 2267 2268 2269 |
|
change_column_types(transforms, calculate_schema=False)
Changes the data type of one or more columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transforms
|
List[SelectInput]
|
A list of |
required |
calculate_schema
|
bool
|
If True, recalculates the schema after the type change. |
False
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 |
|
collect(n_records=None)
Collects the data and returns it as a Polars DataFrame.
This method triggers the execution of the lazy query plan (if applicable) and returns the result. It supports streaming to optimize memory usage for large datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_records
|
int
|
The maximum number of records to collect. If None, all records are collected. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A Polars |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 |
|
collect_external()
Materializes data from a tracked external source.
If the FlowDataEngine
was created from an ExternalDataSource
, this
method will trigger the data retrieval, update the internal _data_frame
to a LazyFrame
of the collected data, and reset the schema to be
re-evaluated.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 |
|
concat(other)
Concatenates this DataFrame with one or more other DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Iterable[FlowDataEngine] | FlowDataEngine
|
A single |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 |
|
count()
Gets the total number of records.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2271 2272 2273 |
|
create_from_external_source(external_source)
classmethod
Creates a FlowDataEngine from an external data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
external_source
|
ExternalDataSource
|
An object that conforms to the |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 |
|
create_from_path(received_table)
classmethod
Creates a FlowDataEngine from a local file path.
Supports various file types like CSV, Parquet, and Excel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
received_table
|
ReceivedTableBase
|
A |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 |
|
create_from_path_worker(received_table, flow_id, node_id)
classmethod
Creates a FlowDataEngine from a path in a worker process.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2275 2276 2277 2278 2279 2280 2281 |
|
create_from_schema(schema)
classmethod
Creates an empty FlowDataEngine from a schema definition.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
List[FlowfileColumn]
|
A list of |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new, empty |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 |
|
create_from_sql(sql, conn)
classmethod
Creates a FlowDataEngine by executing a SQL query.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sql
|
str
|
The SQL query string to execute. |
required |
conn
|
Any
|
A database connection object or connection URI string. |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 |
|
create_random(number_of_records=1000)
classmethod
Creates a FlowDataEngine with randomly generated data.
Useful for testing and examples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
number_of_records
|
int
|
The number of random records to generate. |
1000
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 |
|
do_cross_join(cross_join_input, auto_generate_selection, verify_integrity, other)
Performs a cross join with another DataFrame.
A cross join produces the Cartesian product of the two DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cross_join_input
|
CrossJoinInput
|
A |
required |
auto_generate_selection
|
bool
|
If True, automatically renames columns to avoid conflicts. |
required |
verify_integrity
|
bool
|
If True, checks if the resulting join would be too large. |
required |
other
|
FlowDataEngine
|
The right |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Raises:
Type | Description |
---|---|
Exception
|
If |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 |
|
do_filter(predicate)
Filters rows based on a predicate expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicate
|
str
|
A string containing a Polars expression that evaluates to a boolean value. |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
FlowDataEngine
|
the predicate. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 |
|
do_fuzzy_join(fuzzy_match_input, other, file_ref, flow_id=-1, node_id=-1)
Performs a fuzzy join with another DataFrame.
This method blocks until the fuzzy join operation is complete.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuzzy_match_input
|
FuzzyMatchInput
|
A |
required |
other
|
FlowDataEngine
|
The right |
required |
file_ref
|
str
|
A reference string for temporary files. |
required |
flow_id
|
int
|
The flow ID for tracking. |
-1
|
node_id
|
int | str
|
The node ID for tracking. |
-1
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 |
|
do_group_by(group_by_input, calculate_schema_stats=True)
Performs a group-by operation on the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_by_input
|
GroupByInput
|
A |
required |
calculate_schema_stats
|
bool
|
If True, calculates schema statistics for the resulting DataFrame. |
True
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 |
|
do_pivot(pivot_input, node_logger=None)
Converts the DataFrame from a long to a wide format, aggregating values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pivot_input
|
PivotInput
|
A |
required |
node_logger
|
NodeLogger
|
An optional logger for reporting warnings, e.g., if the pivot column has too many unique values. |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new, pivoted |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 |
|
do_select(select_inputs, keep_missing=True)
Performs a complex column selection, renaming, and reordering operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
select_inputs
|
SelectInputs
|
A |
required |
keep_missing
|
bool
|
If True, columns not specified in |
True
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 |
|
do_sort(sorts)
Sorts the DataFrame by one or more columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sorts
|
List[SortByInput]
|
A list of |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 |
|
drop_columns(columns)
Drops specified columns from the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
List[str]
|
A list of column names to drop. |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 |
|
from_cloud_storage_obj(settings)
classmethod
Creates a FlowDataEngine from an object in cloud storage.
This method supports reading from various cloud storage providers like AWS S3, Azure Data Lake Storage, and Google Cloud Storage, with support for various authentication methods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
settings
|
CloudStorageReadSettingsInternal
|
A |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Raises:
Type | Description |
---|---|
ValueError
|
If the storage type or file format is not supported. |
NotImplementedError
|
If a requested file format like "delta" or "iceberg" is not yet implemented. |
Exception
|
If reading from cloud storage fails. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 |
|
fuzzy_match(right, left_on, right_on, fuzzy_method='levenshtein', threshold=0.75)
Performs a simple fuzzy match between two DataFrames on a single column pair.
This is a convenience method for a common fuzzy join scenario.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
right
|
FlowDataEngine
|
The right |
required |
left_on
|
str
|
The column name from the left DataFrame to match on. |
required |
right_on
|
str
|
The column name from the right DataFrame to match on. |
required |
fuzzy_method
|
str
|
The fuzzy matching algorithm to use (e.g., 'levenshtein'). |
'levenshtein'
|
threshold
|
float
|
The similarity score threshold (0.0 to 1.0) for a match. |
0.75
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 |
|
generate_enumerator(length=1000, output_name='output_column')
classmethod
Generates a FlowDataEngine with a single column containing a sequence of integers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
length
|
int
|
The number of integers to generate in the sequence. |
1000
|
output_name
|
str
|
The name of the output column. |
'output_column'
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 |
|
get_estimated_file_size()
Estimates the file size in bytes if the data originated from a local file.
This relies on the original path being tracked during file ingestion.
Returns:
Type | Description |
---|---|
int
|
The file size in bytes, or 0 if the original path is unknown. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 |
|
get_number_of_records(warn=False, force_calculate=False, calculate_in_worker_process=False)
Gets the total number of records in the DataFrame.
For lazy frames, this may trigger a full data scan, which can be expensive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
warn
|
bool
|
If True, logs a warning if a potentially expensive calculation is triggered. |
False
|
force_calculate
|
bool
|
If True, forces recalculation even if a value is cached. |
False
|
calculate_in_worker_process
|
bool
|
If True, offloads the calculation to a worker process. |
False
|
Returns:
Type | Description |
---|---|
int
|
The total number of records. |
Raises:
Type | Description |
---|---|
ValueError
|
If the number of records could not be determined. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 |
|
get_output_sample(n_rows=10)
Gets a sample of the data as a list of dictionaries.
This is typically used to display a preview of the data in a UI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_rows
|
int
|
The number of rows to sample. |
10
|
Returns:
Type | Description |
---|---|
List[Dict]
|
A list of dictionaries, where each dictionary represents a row. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 |
|
get_record_count()
Returns a new FlowDataEngine with a single column 'number_of_records' containing the total number of records.
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1875 1876 1877 1878 1879 1880 1881 1882 |
|
get_sample(n_rows=100, random=False, shuffle=False, seed=None)
Gets a sample of rows from the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_rows
|
int
|
The number of rows to sample. |
100
|
random
|
bool
|
If True, performs random sampling. If False, takes the first n_rows. |
False
|
shuffle
|
bool
|
If True (and |
False
|
seed
|
int
|
A random seed for reproducibility. |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 |
|
get_schema_column(col_name)
Retrieves the schema information for a single column by its name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name
|
str
|
The name of the column to retrieve. |
required |
Returns:
Type | Description |
---|---|
FlowfileColumn
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 |
|
get_select_inputs()
Gets SelectInput
specifications for all columns in the current schema.
Returns:
Type | Description |
---|---|
SelectInputs
|
A |
SelectInputs
|
transformation operations. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 |
|
get_subset(n_rows=100)
Gets the first n_rows
from the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_rows
|
int
|
The number of rows to include in the subset. |
100
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 |
|
initialize_empty_fl()
Initializes an empty LazyFrame.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1923 1924 1925 1926 1927 |
|
iter_batches(batch_size=1000, columns=None)
Iterates over the DataFrame in batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
The size of each batch. |
1000
|
columns
|
Union[List, Tuple, str]
|
A list of column names to include in the batches. If None, all columns are included. |
None
|
Yields:
Type | Description |
---|---|
FlowDataEngine
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 |
|
join(join_input, auto_generate_selection, verify_integrity, other)
Performs a standard SQL-style join with another DataFrame.
Supports various join types like 'inner', 'left', 'right', 'outer', 'semi', and 'anti'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
join_input
|
JoinInput
|
A |
required |
auto_generate_selection
|
bool
|
If True, automatically handles column renaming. |
required |
verify_integrity
|
bool
|
If True, performs checks to prevent excessively large joins. |
required |
other
|
FlowDataEngine
|
The right |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Raises:
Type | Description |
---|---|
Exception
|
If the join configuration is invalid or if |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 |
|
make_unique(unique_input=None)
Gets the unique rows from the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unique_input
|
UniqueInput
|
A |
None
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 |
|
output(output_fs, flow_id, node_id, execute_remote=True)
Writes the DataFrame to an output file.
Can execute the write operation locally or in a remote worker process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_fs
|
OutputSettings
|
An |
required |
flow_id
|
int
|
The flow ID for tracking. |
required |
node_id
|
int | str
|
The node ID for tracking. |
required |
execute_remote
|
bool
|
If True, executes the write in a worker process. |
True
|
Returns:
Type | Description |
---|---|
FlowDataEngine
|
The same |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 |
|
reorganize_order(column_order)
Reorganizes columns into a specified order.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_order
|
List[str]
|
A list of column names in the desired order. |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 |
|
save(path, data_type='parquet')
Saves the DataFrame to a file in a separate thread.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The file path to save to. |
required |
data_type
|
str
|
The format to save in (e.g., 'parquet', 'csv'). |
'parquet'
|
Returns:
Type | Description |
---|---|
Future
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 |
|
select_columns(list_select)
Selects a subset of columns from the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_select
|
Union[List[str], Tuple[str], str]
|
A list, tuple, or single string of column names to select. |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 |
|
set_streamable(streamable=False)
Sets whether DataFrame operations should be streamable.
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
2255 2256 2257 |
|
solve_graph(graph_solver_input)
Solves a graph problem represented by 'from' and 'to' columns.
This is used for operations like finding connected components in a graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_solver_input
|
GraphSolverInput
|
A |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 |
|
split(split_input)
Splits a column's text values into multiple rows based on a delimiter.
This operation is often referred to as "exploding" the DataFrame, as it increases the number of rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split_input
|
TextToRowsInput
|
A |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 |
|
start_fuzzy_join(fuzzy_match_input, other, file_ref, flow_id=-1, node_id=-1)
Starts a fuzzy join operation in a background process.
This method prepares the data and initiates the fuzzy matching in a separate process, returning a tracker object immediately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fuzzy_match_input
|
FuzzyMatchInput
|
A |
required |
other
|
FlowDataEngine
|
The right |
required |
file_ref
|
str
|
A reference string for temporary files. |
required |
flow_id
|
int
|
The flow ID for tracking. |
-1
|
node_id
|
int | str
|
The node ID for tracking. |
-1
|
Returns:
Type | Description |
---|---|
ExternalFuzzyMatchFetcher
|
An |
ExternalFuzzyMatchFetcher
|
progress and retrieve the result of the fuzzy join. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 |
|
to_arrow()
Converts the DataFrame to a PyArrow Table.
This method triggers a .collect()
call if the data is lazy,
then converts the resulting eager DataFrame into a pyarrow.Table
.
Returns:
Type | Description |
---|---|
Table
|
A |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 |
|
to_cloud_storage_obj(settings)
Writes the DataFrame to an object in cloud storage.
This method supports writing to various cloud storage providers like AWS S3, Azure Data Lake Storage, and Google Cloud Storage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
settings
|
CloudStorageWriteSettingsInternal
|
A |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the specified file format is not supported for writing. |
NotImplementedError
|
If the 'append' write mode is used with an unsupported format. |
Exception
|
If the write operation to cloud storage fails for any reason. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 |
|
to_dict()
Converts the DataFrame to a Python dictionary of columns.
Each key in the dictionary is a column name, and the corresponding value is a list of the data in that column.
Returns:
Type | Description |
---|---|
Dict[str, List]
|
A dictionary mapping column names to lists of their values. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 |
|
to_pylist()
Converts the DataFrame to a list of Python dictionaries.
Returns:
Type | Description |
---|---|
List[Dict]
|
A list where each item is a dictionary representing a row. |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1048 1049 1050 1051 1052 1053 1054 1055 1056 |
|
to_raw_data()
Converts the DataFrame to a RawData
schema object.
Returns:
Type | Description |
---|---|
RawData
|
An |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1072 1073 1074 1075 1076 1077 1078 1079 1080 |
|
unpivot(unpivot_input)
Converts the DataFrame from a wide to a long format.
This is the inverse of a pivot operation, taking columns and transforming
them into variable
and value
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unpivot_input
|
UnpivotInput
|
An |
required |
Returns:
Type | Description |
---|---|
FlowDataEngine
|
A new, unpivoted |
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_data_engine.py
1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 |
|
FlowfileColumn
FlowfileColumn
The FlowfileColumn
is a data class that holds the schema and rich metadata for a single column managed by the FlowDataEngine
.
flowfile_core.flowfile.flow_data_engine.flow_file_column.main.FlowfileColumn
dataclass
Source code in flowfile_core/flowfile_core/flowfile/flow_data_engine/flow_file_column/main.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
Data Modeling (Schemas)
This section documents the Pydantic models that define the structure of settings and data.
schemas
flowfile_core.schemas.schemas
Classes:
Name | Description |
---|---|
FlowGraphConfig |
Configuration model for a flow graph's basic properties. |
FlowInformation |
Represents the complete state of a flow, including settings, nodes, and connections. |
FlowSettings |
Extends FlowGraphConfig with additional operational settings for a flow. |
NodeDefault |
Defines default properties for a node type. |
NodeEdge |
Represents a connection (edge) between two nodes in the frontend. |
NodeInformation |
Stores the state and configuration of a specific node instance within a flow. |
NodeInput |
Represents a node as it is received from the frontend, including position. |
NodeTemplate |
Defines the template for a node type, specifying its UI and functional characteristics. |
RawLogInput |
Schema for a raw log message. |
VueFlowInput |
Represents the complete graph structure from the Vue-based frontend. |
FlowGraphConfig
pydantic-model
Bases: BaseModel
Configuration model for a flow graph's basic properties.
Attributes:
Name | Type | Description |
---|---|---|
flow_id |
int
|
Unique identifier for the flow. |
description |
Optional[str]
|
A description of the flow. |
save_location |
Optional[str]
|
The location where the flow is saved. |
name |
str
|
The name of the flow. |
path |
str
|
The file path associated with the flow. |
execution_mode |
ExecutionModeLiteral
|
The mode of execution ('Development' or 'Performance'). |
execution_location |
ExecutionLocationsLiteral
|
The location for execution ('auto', 'local', 'remote'). |
Show JSON schema:
{
"description": "Configuration model for a flow graph's basic properties.\n\nAttributes:\n flow_id (int): Unique identifier for the flow.\n description (Optional[str]): A description of the flow.\n save_location (Optional[str]): The location where the flow is saved.\n name (str): The name of the flow.\n path (str): The file path associated with the flow.\n execution_mode (ExecutionModeLiteral): The mode of execution ('Development' or 'Performance').\n execution_location (ExecutionLocationsLiteral): The location for execution ('auto', 'local', 'remote').",
"properties": {
"flow_id": {
"description": "Unique identifier for the flow.",
"title": "Flow Id",
"type": "integer"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Description"
},
"save_location": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Save Location"
},
"name": {
"default": "",
"title": "Name",
"type": "string"
},
"path": {
"default": "",
"title": "Path",
"type": "string"
},
"execution_mode": {
"default": "Performance",
"enum": [
"Development",
"Performance"
],
"title": "Execution Mode",
"type": "string"
},
"execution_location": {
"default": "auto",
"enum": [
"auto",
"local",
"remote"
],
"title": "Execution Location",
"type": "string"
}
},
"title": "FlowGraphConfig",
"type": "object"
}
Fields:
-
flow_id
(int
) -
description
(Optional[str]
) -
save_location
(Optional[str]
) -
name
(str
) -
path
(str
) -
execution_mode
(ExecutionModeLiteral
) -
execution_location
(ExecutionLocationsLiteral
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
flow_id
pydantic-field
Unique identifier for the flow.
FlowInformation
pydantic-model
Bases: BaseModel
Represents the complete state of a flow, including settings, nodes, and connections.
Attributes:
Name | Type | Description |
---|---|---|
flow_id |
int
|
The unique ID of the flow. |
flow_name |
Optional[str]
|
The name of the flow. |
flow_settings |
FlowSettings
|
The settings for the flow. |
data |
Dict[int, NodeInformation]
|
A dictionary mapping node IDs to their information. |
node_starts |
List[int]
|
A list of starting node IDs. |
node_connections |
List[Tuple[int, int]]
|
A list of tuples representing connections between nodes. |
Show JSON schema:
{
"$defs": {
"FlowSettings": {
"description": "Extends FlowGraphConfig with additional operational settings for a flow.\n\nAttributes:\n auto_save (bool): Flag to enable or disable automatic saving.\n modified_on (Optional[float]): Timestamp of the last modification.\n show_detailed_progress (bool): Flag to show detailed progress during execution.\n is_running (bool): Indicates if the flow is currently running.\n is_canceled (bool): Indicates if the flow execution has been canceled.",
"properties": {
"flow_id": {
"description": "Unique identifier for the flow.",
"title": "Flow Id",
"type": "integer"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Description"
},
"save_location": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Save Location"
},
"name": {
"default": "",
"title": "Name",
"type": "string"
},
"path": {
"default": "",
"title": "Path",
"type": "string"
},
"execution_mode": {
"default": "Performance",
"enum": [
"Development",
"Performance"
],
"title": "Execution Mode",
"type": "string"
},
"execution_location": {
"default": "auto",
"enum": [
"auto",
"local",
"remote"
],
"title": "Execution Location",
"type": "string"
},
"auto_save": {
"default": false,
"title": "Auto Save",
"type": "boolean"
},
"modified_on": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": null,
"title": "Modified On"
},
"show_detailed_progress": {
"default": true,
"title": "Show Detailed Progress",
"type": "boolean"
},
"is_running": {
"default": false,
"title": "Is Running",
"type": "boolean"
},
"is_canceled": {
"default": false,
"title": "Is Canceled",
"type": "boolean"
}
},
"title": "FlowSettings",
"type": "object"
},
"NodeInformation": {
"description": "Stores the state and configuration of a specific node instance within a flow.\n\nAttributes:\n id (Optional[int]): The unique ID of the node instance.\n type (Optional[str]): The type of the node (e.g., 'join', 'filter').\n is_setup (Optional[bool]): Whether the node has been configured.\n description (Optional[str]): A user-provided description.\n x_position (Optional[int]): The x-coordinate on the canvas.\n y_position (Optional[int]): The y-coordinate on the canvas.\n left_input_id (Optional[int]): The ID of the node connected to the left input.\n right_input_id (Optional[int]): The ID of the node connected to the right input.\n input_ids (Optional[List[int]]): A list of IDs for main input nodes.\n outputs (Optional[List[int]]): A list of IDs for nodes this node outputs to.\n setting_input (Optional[Any]): The specific settings for this node instance.",
"properties": {
"id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Id"
},
"type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Type"
},
"is_setup": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Is Setup"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"title": "Description"
},
"x_position": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"title": "X Position"
},
"y_position": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"title": "Y Position"
},
"left_input_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Left Input Id"
},
"right_input_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Right Input Id"
},
"input_ids": {
"anyOf": [
{
"items": {
"type": "integer"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
-1
],
"title": "Input Ids"
},
"outputs": {
"anyOf": [
{
"items": {
"type": "integer"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
-1
],
"title": "Outputs"
},
"setting_input": {
"anyOf": [
{},
{
"type": "null"
}
],
"default": null,
"title": "Setting Input"
}
},
"title": "NodeInformation",
"type": "object"
}
},
"description": "Represents the complete state of a flow, including settings, nodes, and connections.\n\nAttributes:\n flow_id (int): The unique ID of the flow.\n flow_name (Optional[str]): The name of the flow.\n flow_settings (FlowSettings): The settings for the flow.\n data (Dict[int, NodeInformation]): A dictionary mapping node IDs to their information.\n node_starts (List[int]): A list of starting node IDs.\n node_connections (List[Tuple[int, int]]): A list of tuples representing connections between nodes.",
"properties": {
"flow_id": {
"title": "Flow Id",
"type": "integer"
},
"flow_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"title": "Flow Name"
},
"flow_settings": {
"$ref": "#/$defs/FlowSettings"
},
"data": {
"additionalProperties": {
"$ref": "#/$defs/NodeInformation"
},
"default": {},
"title": "Data",
"type": "object"
},
"node_starts": {
"items": {
"type": "integer"
},
"title": "Node Starts",
"type": "array"
},
"node_connections": {
"default": [],
"items": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "integer"
},
{
"type": "integer"
}
],
"type": "array"
},
"title": "Node Connections",
"type": "array"
}
},
"required": [
"flow_id",
"flow_settings",
"node_starts"
],
"title": "FlowInformation",
"type": "object"
}
Fields:
-
flow_id
(int
) -
flow_name
(Optional[str]
) -
flow_settings
(FlowSettings
) -
data
(Dict[int, NodeInformation]
) -
node_starts
(List[int]
) -
node_connections
(List[Tuple[int, int]]
)
Validators:
-
ensure_string
→flow_name
Source code in flowfile_core/flowfile_core/schemas/schemas.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
ensure_string(v)
pydantic-validator
Validator to ensure the flow_name is always a string. :param v: The value to validate. :return: The value as a string, or an empty string if it's None.
Source code in flowfile_core/flowfile_core/schemas/schemas.py
165 166 167 168 169 170 171 172 |
|
FlowSettings
pydantic-model
Bases: FlowGraphConfig
Extends FlowGraphConfig with additional operational settings for a flow.
Attributes:
Name | Type | Description |
---|---|---|
auto_save |
bool
|
Flag to enable or disable automatic saving. |
modified_on |
Optional[float]
|
Timestamp of the last modification. |
show_detailed_progress |
bool
|
Flag to show detailed progress during execution. |
is_running |
bool
|
Indicates if the flow is currently running. |
is_canceled |
bool
|
Indicates if the flow execution has been canceled. |
Show JSON schema:
{
"description": "Extends FlowGraphConfig with additional operational settings for a flow.\n\nAttributes:\n auto_save (bool): Flag to enable or disable automatic saving.\n modified_on (Optional[float]): Timestamp of the last modification.\n show_detailed_progress (bool): Flag to show detailed progress during execution.\n is_running (bool): Indicates if the flow is currently running.\n is_canceled (bool): Indicates if the flow execution has been canceled.",
"properties": {
"flow_id": {
"description": "Unique identifier for the flow.",
"title": "Flow Id",
"type": "integer"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Description"
},
"save_location": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Save Location"
},
"name": {
"default": "",
"title": "Name",
"type": "string"
},
"path": {
"default": "",
"title": "Path",
"type": "string"
},
"execution_mode": {
"default": "Performance",
"enum": [
"Development",
"Performance"
],
"title": "Execution Mode",
"type": "string"
},
"execution_location": {
"default": "auto",
"enum": [
"auto",
"local",
"remote"
],
"title": "Execution Location",
"type": "string"
},
"auto_save": {
"default": false,
"title": "Auto Save",
"type": "boolean"
},
"modified_on": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": null,
"title": "Modified On"
},
"show_detailed_progress": {
"default": true,
"title": "Show Detailed Progress",
"type": "boolean"
},
"is_running": {
"default": false,
"title": "Is Running",
"type": "boolean"
},
"is_canceled": {
"default": false,
"title": "Is Canceled",
"type": "boolean"
}
},
"title": "FlowSettings",
"type": "object"
}
Fields:
-
flow_id
(int
) -
description
(Optional[str]
) -
save_location
(Optional[str]
) -
name
(str
) -
path
(str
) -
execution_mode
(ExecutionModeLiteral
) -
execution_location
(ExecutionLocationsLiteral
) -
auto_save
(bool
) -
modified_on
(Optional[float]
) -
show_detailed_progress
(bool
) -
is_running
(bool
) -
is_canceled
(bool
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
from_flow_settings_input(flow_graph_config)
classmethod
Creates a FlowSettings instance from a FlowGraphConfig instance.
:param flow_graph_config: The base flow graph configuration. :return: A new instance of FlowSettings with data from flow_graph_config.
Source code in flowfile_core/flowfile_core/schemas/schemas.py
47 48 49 50 51 52 53 54 55 |
|
NodeDefault
pydantic-model
Bases: BaseModel
Defines default properties for a node type.
Attributes:
Name | Type | Description |
---|---|---|
node_name |
str
|
The name of the node. |
node_type |
NodeTypeLiteral
|
The functional type of the node ('input', 'output', 'process'). |
transform_type |
TransformTypeLiteral
|
The data transformation behavior ('narrow', 'wide', 'other'). |
has_default_settings |
Optional[Any]
|
Indicates if the node has predefined default settings. |
Show JSON schema:
{
"description": "Defines default properties for a node type.\n\nAttributes:\n node_name (str): The name of the node.\n node_type (NodeTypeLiteral): The functional type of the node ('input', 'output', 'process').\n transform_type (TransformTypeLiteral): The data transformation behavior ('narrow', 'wide', 'other').\n has_default_settings (Optional[Any]): Indicates if the node has predefined default settings.",
"properties": {
"node_name": {
"title": "Node Name",
"type": "string"
},
"node_type": {
"enum": [
"input",
"output",
"process"
],
"title": "Node Type",
"type": "string"
},
"transform_type": {
"enum": [
"narrow",
"wide",
"other"
],
"title": "Transform Type",
"type": "string"
},
"has_default_settings": {
"anyOf": [
{},
{
"type": "null"
}
],
"default": null,
"title": "Has Default Settings"
}
},
"required": [
"node_name",
"node_type",
"transform_type"
],
"title": "NodeDefault",
"type": "object"
}
Fields:
-
node_name
(str
) -
node_type
(NodeTypeLiteral
) -
transform_type
(TransformTypeLiteral
) -
has_default_settings
(Optional[Any]
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
NodeEdge
pydantic-model
Bases: BaseModel
Represents a connection (edge) between two nodes in the frontend.
Attributes:
Name | Type | Description |
---|---|---|
id |
str
|
A unique identifier for the edge. |
source |
str
|
The ID of the source node. |
target |
str
|
The ID of the target node. |
targetHandle |
str
|
The specific input handle on the target node. |
sourceHandle |
str
|
The specific output handle on the source node. |
Show JSON schema:
{
"description": "Represents a connection (edge) between two nodes in the frontend.\n\nAttributes:\n id (str): A unique identifier for the edge.\n source (str): The ID of the source node.\n target (str): The ID of the target node.\n targetHandle (str): The specific input handle on the target node.\n sourceHandle (str): The specific output handle on the source node.",
"properties": {
"id": {
"title": "Id",
"type": "string"
},
"source": {
"title": "Source",
"type": "string"
},
"target": {
"title": "Target",
"type": "string"
},
"targetHandle": {
"title": "Targethandle",
"type": "string"
},
"sourceHandle": {
"title": "Sourcehandle",
"type": "string"
}
},
"required": [
"id",
"source",
"target",
"targetHandle",
"sourceHandle"
],
"title": "NodeEdge",
"type": "object"
}
Config:
coerce_numbers_to_str
:True
Fields:
-
id
(str
) -
source
(str
) -
target
(str
) -
targetHandle
(str
) -
sourceHandle
(str
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
NodeInformation
pydantic-model
Bases: BaseModel
Stores the state and configuration of a specific node instance within a flow.
Attributes:
Name | Type | Description |
---|---|---|
id |
Optional[int]
|
The unique ID of the node instance. |
type |
Optional[str]
|
The type of the node (e.g., 'join', 'filter'). |
is_setup |
Optional[bool]
|
Whether the node has been configured. |
description |
Optional[str]
|
A user-provided description. |
x_position |
Optional[int]
|
The x-coordinate on the canvas. |
y_position |
Optional[int]
|
The y-coordinate on the canvas. |
left_input_id |
Optional[int]
|
The ID of the node connected to the left input. |
right_input_id |
Optional[int]
|
The ID of the node connected to the right input. |
input_ids |
Optional[List[int]]
|
A list of IDs for main input nodes. |
outputs |
Optional[List[int]]
|
A list of IDs for nodes this node outputs to. |
setting_input |
Optional[Any]
|
The specific settings for this node instance. |
Show JSON schema:
{
"description": "Stores the state and configuration of a specific node instance within a flow.\n\nAttributes:\n id (Optional[int]): The unique ID of the node instance.\n type (Optional[str]): The type of the node (e.g., 'join', 'filter').\n is_setup (Optional[bool]): Whether the node has been configured.\n description (Optional[str]): A user-provided description.\n x_position (Optional[int]): The x-coordinate on the canvas.\n y_position (Optional[int]): The y-coordinate on the canvas.\n left_input_id (Optional[int]): The ID of the node connected to the left input.\n right_input_id (Optional[int]): The ID of the node connected to the right input.\n input_ids (Optional[List[int]]): A list of IDs for main input nodes.\n outputs (Optional[List[int]]): A list of IDs for nodes this node outputs to.\n setting_input (Optional[Any]): The specific settings for this node instance.",
"properties": {
"id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Id"
},
"type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Type"
},
"is_setup": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Is Setup"
},
"description": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"title": "Description"
},
"x_position": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"title": "X Position"
},
"y_position": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"title": "Y Position"
},
"left_input_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Left Input Id"
},
"right_input_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Right Input Id"
},
"input_ids": {
"anyOf": [
{
"items": {
"type": "integer"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
-1
],
"title": "Input Ids"
},
"outputs": {
"anyOf": [
{
"items": {
"type": "integer"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
-1
],
"title": "Outputs"
},
"setting_input": {
"anyOf": [
{},
{
"type": "null"
}
],
"default": null,
"title": "Setting Input"
}
},
"title": "NodeInformation",
"type": "object"
}
Fields:
-
id
(Optional[int]
) -
type
(Optional[str]
) -
is_setup
(Optional[bool]
) -
description
(Optional[str]
) -
x_position
(Optional[int]
) -
y_position
(Optional[int]
) -
left_input_id
(Optional[int]
) -
right_input_id
(Optional[int]
) -
input_ids
(Optional[List[int]]
) -
outputs
(Optional[List[int]]
) -
setting_input
(Optional[Any]
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
data
property
Property to access the node's specific settings. :return: The settings of the node.
main_input_ids
property
Property to access the main input node IDs. :return: A list of main input node IDs.
NodeInput
pydantic-model
Bases: NodeTemplate
Represents a node as it is received from the frontend, including position.
Attributes:
Name | Type | Description |
---|---|---|
id |
int
|
The unique ID of the node instance. |
pos_x |
float
|
The x-coordinate on the canvas. |
pos_y |
float
|
The y-coordinate on the canvas. |
Show JSON schema:
{
"description": "Represents a node as it is received from the frontend, including position.\n\nAttributes:\n id (int): The unique ID of the node instance.\n pos_x (float): The x-coordinate on the canvas.\n pos_y (float): The y-coordinate on the canvas.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"item": {
"title": "Item",
"type": "string"
},
"input": {
"title": "Input",
"type": "integer"
},
"output": {
"title": "Output",
"type": "integer"
},
"image": {
"title": "Image",
"type": "string"
},
"multi": {
"default": false,
"title": "Multi",
"type": "boolean"
},
"node_group": {
"title": "Node Group",
"type": "string"
},
"prod_ready": {
"default": true,
"title": "Prod Ready",
"type": "boolean"
},
"can_be_start": {
"default": false,
"title": "Can Be Start",
"type": "boolean"
},
"id": {
"title": "Id",
"type": "integer"
},
"pos_x": {
"title": "Pos X",
"type": "number"
},
"pos_y": {
"title": "Pos Y",
"type": "number"
}
},
"required": [
"name",
"item",
"input",
"output",
"image",
"node_group",
"id",
"pos_x",
"pos_y"
],
"title": "NodeInput",
"type": "object"
}
Fields:
-
name
(str
) -
item
(str
) -
input
(int
) -
output
(int
) -
image
(str
) -
multi
(bool
) -
node_group
(str
) -
prod_ready
(bool
) -
can_be_start
(bool
) -
id
(int
) -
pos_x
(float
) -
pos_y
(float
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
175 176 177 178 179 180 181 182 183 184 185 186 |
|
NodeTemplate
pydantic-model
Bases: BaseModel
Defines the template for a node type, specifying its UI and functional characteristics.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The display name of the node. |
item |
str
|
The unique identifier for the node type. |
input |
int
|
The number of required input connections. |
output |
int
|
The number of output connections. |
image |
str
|
The filename of the icon for the node. |
multi |
bool
|
Whether the node accepts multiple main input connections. |
node_group |
str
|
The category group the node belongs to (e.g., 'input', 'transform'). |
prod_ready |
bool
|
Whether the node is considered production-ready. |
can_be_start |
bool
|
Whether the node can be a starting point in a flow. |
Show JSON schema:
{
"description": "Defines the template for a node type, specifying its UI and functional characteristics.\n\nAttributes:\n name (str): The display name of the node.\n item (str): The unique identifier for the node type.\n input (int): The number of required input connections.\n output (int): The number of output connections.\n image (str): The filename of the icon for the node.\n multi (bool): Whether the node accepts multiple main input connections.\n node_group (str): The category group the node belongs to (e.g., 'input', 'transform').\n prod_ready (bool): Whether the node is considered production-ready.\n can_be_start (bool): Whether the node can be a starting point in a flow.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"item": {
"title": "Item",
"type": "string"
},
"input": {
"title": "Input",
"type": "integer"
},
"output": {
"title": "Output",
"type": "integer"
},
"image": {
"title": "Image",
"type": "string"
},
"multi": {
"default": false,
"title": "Multi",
"type": "boolean"
},
"node_group": {
"title": "Node Group",
"type": "string"
},
"prod_ready": {
"default": true,
"title": "Prod Ready",
"type": "boolean"
},
"can_be_start": {
"default": false,
"title": "Can Be Start",
"type": "boolean"
}
},
"required": [
"name",
"item",
"input",
"output",
"image",
"node_group"
],
"title": "NodeTemplate",
"type": "object"
}
Fields:
-
name
(str
) -
item
(str
) -
input
(int
) -
output
(int
) -
image
(str
) -
multi
(bool
) -
node_group
(str
) -
prod_ready
(bool
) -
can_be_start
(bool
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
RawLogInput
pydantic-model
Bases: BaseModel
Schema for a raw log message.
Attributes:
Name | Type | Description |
---|---|---|
flowfile_flow_id |
int
|
The ID of the flow that generated the log. |
log_message |
str
|
The content of the log message. |
log_type |
Literal['INFO', 'ERROR']
|
The type of log. |
extra |
Optional[dict]
|
Extra context data for the log. |
Show JSON schema:
{
"description": "Schema for a raw log message.\n\nAttributes:\n flowfile_flow_id (int): The ID of the flow that generated the log.\n log_message (str): The content of the log message.\n log_type (Literal[\"INFO\", \"ERROR\"]): The type of log.\n extra (Optional[dict]): Extra context data for the log.",
"properties": {
"flowfile_flow_id": {
"title": "Flowfile Flow Id",
"type": "integer"
},
"log_message": {
"title": "Log Message",
"type": "string"
},
"log_type": {
"enum": [
"INFO",
"ERROR"
],
"title": "Log Type",
"type": "string"
},
"extra": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"title": "Extra"
}
},
"required": [
"flowfile_flow_id",
"log_message",
"log_type"
],
"title": "RawLogInput",
"type": "object"
}
Fields:
-
flowfile_flow_id
(int
) -
log_message
(str
) -
log_type
(Literal['INFO', 'ERROR']
) -
extra
(Optional[dict]
)
Source code in flowfile_core/flowfile_core/schemas/schemas.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
VueFlowInput
pydantic-model
Bases: BaseModel
Represents the complete graph structure from the Vue-based frontend.
Attributes:
Name | Type | Description |
---|---|---|
node_edges |
List[NodeEdge]
|
A list of all edges in the graph. |
node_inputs |
List[NodeInput]
|
A list of all nodes in the graph. |
Show JSON schema:
{
"$defs": {
"NodeEdge": {
"description": "Represents a connection (edge) between two nodes in the frontend.\n\nAttributes:\n id (str): A unique identifier for the edge.\n source (str): The ID of the source node.\n target (str): The ID of the target node.\n targetHandle (str): The specific input handle on the target node.\n sourceHandle (str): The specific output handle on the source node.",
"properties": {
"id": {
"title": "Id",
"type": "string"
},
"source": {
"title": "Source",
"type": "string"
},
"target": {
"title": "Target",
"type": "string"
},
"targetHandle": {
"title": "Targethandle",
"type": "string"
},
"sourceHandle": {
"title": "Sourcehandle",
"type": "string"
}
},
"required": [
"id",
"source",
"target",
"targetHandle",
"sourceHandle"
],
"title": "NodeEdge",
"type": "object"
},
"NodeInput": {
"description": "Represents a node as it is received from the frontend, including position.\n\nAttributes:\n id (int): The unique ID of the node instance.\n pos_x (float): The x-coordinate on the canvas.\n pos_y (float): The y-coordinate on the canvas.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"item": {
"title": "Item",
"type": "string"
},
"input": {
"title": "Input",
"type": "integer"
},
"output": {
"title": "Output",
"type": "integer"
},
"image": {
"title": "Image",
"type": "string"
},
"multi": {
"default": false,
"title": "Multi",
"type": "boolean"
},
"node_group": {
"title": "Node Group",
"type": "string"
},
"prod_ready": {
"default": true,
"title": "Prod Ready",
"type": "boolean"
},
"can_be_start": {
"default": false,
"title": "Can Be Start",
"type": "boolean"
},
"id": {
"title": "Id",
"type": "integer"
},
"pos_x": {
"title": "Pos X",
"type": "number"
},
"pos_y": {
"title": "Pos Y",
"type": "number"
}
},
"required": [
"name",
"item",
"input",
"output",
"image",
"node_group",
"id",
"pos_x",
"pos_y"
],
"title": "NodeInput",
"type": "object"
}
},
"description": "Represents the complete graph structure from the Vue-based frontend.\n\nAttributes:\n node_edges (List[NodeEdge]): A list of all edges in the graph.\n node_inputs (List[NodeInput]): A list of all nodes in the graph.",
"properties": {
"node_edges": {
"items": {
"$ref": "#/$defs/NodeEdge"
},
"title": "Node Edges",
"type": "array"
},
"node_inputs": {
"items": {
"$ref": "#/$defs/NodeInput"
},
"title": "Node Inputs",
"type": "array"
}
},
"required": [
"node_edges",
"node_inputs"
],
"title": "VueFlowInput",
"type": "object"
}
Fields:
Source code in flowfile_core/flowfile_core/schemas/schemas.py
208 209 210 211 212 213 214 215 216 217 218 |
|
input_schema
flowfile_core.schemas.input_schema
Classes:
Name | Description |
---|---|
DatabaseConnection |
Defines the connection parameters for a database. |
DatabaseSettings |
Defines settings for reading from a database, either via table or query. |
DatabaseWriteSettings |
Defines settings for writing data to a database table. |
ExternalSource |
Base model for data coming from a predefined external source. |
FullDatabaseConnection |
A complete database connection model including the secret password. |
FullDatabaseConnectionInterface |
A database connection model intended for UI display, omitting the password. |
MinimalFieldInfo |
Represents the most basic information about a data field (column). |
NewDirectory |
Defines the information required to create a new directory. |
NodeBase |
Base model for all nodes in a FlowGraph. Contains common metadata. |
NodeCloudStorageReader |
Settings for a node that reads from a cloud storage service (S3, GCS, etc.). |
NodeCloudStorageWriter |
Settings for a node that writes to a cloud storage service. |
NodeConnection |
Represents a connection (edge) between two nodes in the graph. |
NodeCrossJoin |
Settings for a node that performs a cross join. |
NodeDatabaseReader |
Settings for a node that reads from a database. |
NodeDatabaseWriter |
Settings for a node that writes data to a database. |
NodeDatasource |
Base settings for a node that acts as a data source. |
NodeDescription |
A simple model for updating a node's description text. |
NodeExploreData |
Settings for a node that provides an interactive data exploration interface. |
NodeExternalSource |
Settings for a node that connects to a registered external data source. |
NodeFilter |
Settings for a node that filters rows based on a condition. |
NodeFormula |
Settings for a node that applies a formula to create/modify a column. |
NodeFuzzyMatch |
Settings for a node that performs a fuzzy join based on string similarity. |
NodeGraphSolver |
Settings for a node that solves graph-based problems (e.g., connected components). |
NodeGroupBy |
Settings for a node that performs a group-by and aggregation operation. |
NodeInputConnection |
Represents the input side of a connection between two nodes. |
NodeJoin |
Settings for a node that performs a standard SQL-style join. |
NodeManualInput |
Settings for a node that allows direct data entry in the UI. |
NodeMultiInput |
A base model for any node that takes multiple data inputs. |
NodeOutput |
Settings for a node that writes its input to a file. |
NodeOutputConnection |
Represents the output side of a connection between two nodes. |
NodePivot |
Settings for a node that pivots data from a long to a wide format. |
NodePolarsCode |
Settings for a node that executes arbitrary user-provided Polars code. |
NodePromise |
A placeholder node for an operation that has not yet been configured. |
NodeRead |
Settings for a node that reads data from a file. |
NodeRecordCount |
Settings for a node that counts the number of records. |
NodeRecordId |
Settings for a node that adds a unique record ID column. |
NodeSample |
Settings for a node that samples a subset of the data. |
NodeSelect |
Settings for a node that selects, renames, and reorders columns. |
NodeSingleInput |
A base model for any node that takes a single data input. |
NodeSort |
Settings for a node that sorts the data by one or more columns. |
NodeTextToRows |
Settings for a node that splits a text column into multiple rows. |
NodeUnion |
Settings for a node that concatenates multiple data inputs. |
NodeUnique |
Settings for a node that returns the unique rows from the data. |
NodeUnpivot |
Settings for a node that unpivots data from a wide to a long format. |
OutputCsvTable |
Defines settings for writing a CSV file. |
OutputExcelTable |
Defines settings for writing an Excel file. |
OutputParquetTable |
Defines settings for writing a Parquet file. |
OutputSettings |
Defines the complete settings for an output node. |
RawData |
Represents data in a raw, columnar format for manual input. |
ReceivedCsvTable |
Defines settings for reading a CSV file. |
ReceivedExcelTable |
Defines settings for reading an Excel file. |
ReceivedJsonTable |
Defines settings for reading a JSON file (inherits from CSV settings). |
ReceivedParquetTable |
Defines settings for reading a Parquet file. |
ReceivedTable |
A comprehensive model that can represent any type of received table. |
ReceivedTableBase |
Base model for defining a table received from an external source. |
RemoveItem |
Represents a single item to be removed from a directory or list. |
RemoveItemsInput |
Defines a list of items to be removed. |
SampleUsers |
Settings for generating a sample dataset of users. |
DatabaseConnection
Bases: BaseModel
Defines the connection parameters for a database.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
304 305 306 307 308 309 310 311 312 |
|
DatabaseSettings
Bases: BaseModel
Defines settings for reading from a database, either via table or query.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
|
DatabaseWriteSettings
Bases: BaseModel
Defines settings for writing data to a database table.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
366 367 368 369 370 371 372 373 |
|
ExternalSource
Bases: BaseModel
Base model for data coming from a predefined external source.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
398 399 400 401 |
|
FullDatabaseConnection
Bases: BaseModel
A complete database connection model including the secret password.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
315 316 317 318 319 320 321 322 323 324 325 |
|
FullDatabaseConnectionInterface
Bases: BaseModel
A database connection model intended for UI display, omitting the password.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
328 329 330 331 332 333 334 335 336 337 |
|
MinimalFieldInfo
Bases: BaseModel
Represents the most basic information about a data field (column).
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
40 41 42 43 |
|
NewDirectory
Bases: BaseModel
Defines the information required to create a new directory.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
22 23 24 25 |
|
NodeBase
Bases: BaseModel
Base model for all nodes in a FlowGraph. Contains common metadata.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
186 187 188 189 190 191 192 193 194 195 196 197 |
|
NodeCloudStorageReader
Bases: NodeBase
Settings for a node that reads from a cloud storage service (S3, GCS, etc.).
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
387 388 389 390 |
|
NodeCloudStorageWriter
Bases: NodeSingleInput
Settings for a node that writes to a cloud storage service.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
393 394 395 |
|
NodeConnection
Bases: BaseModel
Represents a connection (edge) between two nodes in the graph.
Methods:
Name | Description |
---|---|
create_from_simple_input |
Creates a standard connection between two nodes. |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 |
|
create_from_simple_input(from_id, to_id, input_type='input-0')
classmethod
Creates a standard connection between two nodes.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
479 480 481 482 483 484 485 486 487 488 489 |
|
NodeCrossJoin
Bases: NodeMultiInput
Settings for a node that performs a cross join.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
252 253 254 255 256 257 258 259 |
|
NodeDatabaseReader
Bases: NodeBase
Settings for a node that reads from a database.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
376 377 378 379 |
|
NodeDatabaseWriter
Bases: NodeSingleInput
Settings for a node that writes data to a database.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
382 383 384 |
|
NodeDatasource
Bases: NodeBase
Base settings for a node that acts as a data source.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
267 268 269 |
|
NodeDescription
Bases: BaseModel
A simple model for updating a node's description text.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
492 493 494 |
|
NodeExploreData
Bases: NodeBase
Settings for a node that provides an interactive data exploration interface.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
497 498 499 |
|
NodeExternalSource
Bases: NodeBase
Settings for a node that connects to a registered external data source.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
411 412 413 414 |
|
NodeFilter
Bases: NodeSingleInput
Settings for a node that filters rows based on a condition.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
217 218 219 |
|
NodeFormula
Bases: NodeSingleInput
Settings for a node that applies a formula to create/modify a column.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
417 418 419 |
|
NodeFuzzyMatch
Bases: NodeJoin
Settings for a node that performs a fuzzy join based on string similarity.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
262 263 264 |
|
NodeGraphSolver
Bases: NodeSingleInput
Settings for a node that solves graph-based problems (e.g., connected components).
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
502 503 504 |
|
NodeGroupBy
Bases: NodeSingleInput
Settings for a node that performs a group-by and aggregation operation.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
422 423 424 |
|
NodeInputConnection
Bases: BaseModel
Represents the input side of a connection between two nodes.
Methods:
Name | Description |
---|---|
get_node_input_connection_type |
Determines the semantic type of the input (e.g., for a join). |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
433 434 435 436 437 438 439 440 441 442 443 444 |
|
get_node_input_connection_type()
Determines the semantic type of the input (e.g., for a join).
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
438 439 440 441 442 443 444 |
|
NodeJoin
Bases: NodeMultiInput
Settings for a node that performs a standard SQL-style join.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
242 243 244 245 246 247 248 249 |
|
NodeManualInput
Bases: NodeBase
Settings for a node that allows direct data entry in the UI.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
294 295 296 |
|
NodeMultiInput
Bases: NodeBase
A base model for any node that takes multiple data inputs.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
205 206 207 |
|
NodeOutput
Bases: NodeSingleInput
Settings for a node that writes its input to a file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
463 464 465 |
|
NodeOutputConnection
Bases: BaseModel
Represents the output side of a connection between two nodes.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
468 469 470 471 |
|
NodePivot
Bases: NodeSingleInput
Settings for a node that pivots data from a long to a wide format.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
447 448 449 450 |
|
NodePolarsCode
Bases: NodeMultiInput
Settings for a node that executes arbitrary user-provided Polars code.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
517 518 519 |
|
NodePromise
Bases: NodeBase
A placeholder node for an operation that has not yet been configured.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
427 428 429 430 |
|
NodeRead
Bases: NodeBase
Settings for a node that reads data from a file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
299 300 301 |
|
NodeRecordCount
Bases: NodeSingleInput
Settings for a node that counts the number of records.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
512 513 514 |
|
NodeRecordId
Bases: NodeSingleInput
Settings for a node that adds a unique record ID column.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
237 238 239 |
|
NodeSample
Bases: NodeSingleInput
Settings for a node that samples a subset of the data.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
232 233 234 |
|
NodeSelect
Bases: NodeSingleInput
Settings for a node that selects, renames, and reorders columns.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
210 211 212 213 214 |
|
NodeSingleInput
Bases: NodeBase
A base model for any node that takes a single data input.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
200 201 202 |
|
NodeSort
Bases: NodeSingleInput
Settings for a node that sorts the data by one or more columns.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
222 223 224 |
|
NodeTextToRows
Bases: NodeSingleInput
Settings for a node that splits a text column into multiple rows.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
227 228 229 |
|
NodeUnion
Bases: NodeMultiInput
Settings for a node that concatenates multiple data inputs.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
458 459 460 |
|
NodeUnique
Bases: NodeSingleInput
Settings for a node that returns the unique rows from the data.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
507 508 509 |
|
NodeUnpivot
Bases: NodeSingleInput
Settings for a node that unpivots data from a wide to a long format.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
453 454 455 |
|
OutputCsvTable
Bases: BaseModel
Defines settings for writing a CSV file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
140 141 142 143 144 |
|
OutputExcelTable
Bases: BaseModel
Defines settings for writing an Excel file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
152 153 154 155 |
|
OutputParquetTable
Bases: BaseModel
Defines settings for writing a Parquet file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
147 148 149 |
|
OutputSettings
Bases: BaseModel
Defines the complete settings for an output node.
Methods:
Name | Description |
---|---|
populate_abs_file_path |
Ensures the absolute file path is populated after validation. |
set_absolute_filepath |
Resolves the output directory and name into an absolute path. |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
populate_abs_file_path()
Ensures the absolute file path is populated after validation.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
179 180 181 182 183 |
|
set_absolute_filepath()
Resolves the output directory and name into an absolute path.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
170 171 172 173 174 175 176 177 |
|
RawData
Bases: BaseModel
Represents data in a raw, columnar format for manual input.
Methods:
Name | Description |
---|---|
from_pylist |
Creates a RawData object from a list of Python dictionaries. |
to_pylist |
Converts the RawData object back into a list of Python dictionaries. |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
|
from_pylist(pylist)
classmethod
Creates a RawData object from a list of Python dictionaries.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
277 278 279 280 281 282 283 284 285 286 287 |
|
to_pylist()
Converts the RawData object back into a list of Python dictionaries.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
289 290 291 |
|
ReceivedCsvTable
Bases: ReceivedTableBase
Defines settings for reading a CSV file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
ReceivedExcelTable
Bases: ReceivedTableBase
Defines settings for reading an Excel file.
Methods:
Name | Description |
---|---|
validate_range_values |
Validates that the Excel cell range is logical. |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
validate_range_values()
Validates that the Excel cell range is logical.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
125 126 127 128 129 130 131 132 |
|
ReceivedJsonTable
Bases: ReceivedCsvTable
Defines settings for reading a JSON file (inherits from CSV settings).
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
105 106 107 |
|
ReceivedParquetTable
Bases: ReceivedTableBase
Defines settings for reading a Parquet file.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
110 111 112 |
|
ReceivedTable
Bases: ReceivedExcelTable
, ReceivedCsvTable
, ReceivedParquetTable
A comprehensive model that can represent any type of received table.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
135 136 137 |
|
ReceivedTableBase
Bases: BaseModel
Base model for defining a table received from an external source.
Methods:
Name | Description |
---|---|
create_from_path |
Creates an instance from a file path string. |
populate_abs_file_path |
Ensures the absolute file path is populated after validation. |
set_absolute_filepath |
Resolves the path to an absolute file path. |
Attributes:
Name | Type | Description |
---|---|---|
file_path |
str
|
Constructs the full file path from the directory and name. |
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
file_path
property
Constructs the full file path from the directory and name.
create_from_path(path)
classmethod
Creates an instance from a file path string.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
58 59 60 61 62 |
|
populate_abs_file_path()
Ensures the absolute file path is populated after validation.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
81 82 83 84 85 86 |
|
set_absolute_filepath()
Resolves the path to an absolute file path.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
72 73 74 75 76 77 78 79 |
|
RemoveItem
Bases: BaseModel
Represents a single item to be removed from a directory or list.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
28 29 30 31 |
|
RemoveItemsInput
Bases: BaseModel
Defines a list of items to be removed.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
34 35 36 37 |
|
SampleUsers
Bases: ExternalSource
Settings for generating a sample dataset of users.
Source code in flowfile_core/flowfile_core/schemas/input_schema.py
404 405 406 407 408 |
|
transform_schema
flowfile_core.schemas.transform_schema
Classes:
Name | Description |
---|---|
AggColl |
A data class that represents a single aggregation operation for a group by operation. |
BasicFilter |
Defines a simple, single-condition filter (e.g., 'column' 'equals' 'value'). |
CrossJoinInput |
Defines the settings for a cross join operation, including column selections for both inputs. |
FieldInput |
Represents a single field with its name and data type, typically for defining an output column. |
FilterInput |
Defines the settings for a filter operation, supporting basic or advanced (expression-based) modes. |
FullJoinKeyResponse |
Holds the join key rename responses for both sides of a join. |
FunctionInput |
Defines a formula to be applied, including the output field information. |
FuzzyMap |
Extends |
FuzzyMatchInput |
Extends |
GraphSolverInput |
Defines settings for a graph-solving operation (e.g., finding connected components). |
GroupByInput |
A data class that represents the input for a group by operation. |
JoinInput |
Defines the settings for a standard SQL-style join, including keys, strategy, and selections. |
JoinInputs |
Extends |
JoinKeyRename |
Represents the renaming of a join key from its original to a temporary name. |
JoinKeyRenameResponse |
Contains a list of join key renames for one side of a join. |
JoinMap |
Defines a single mapping between a left and right column for a join key. |
JoinSelectMixin |
A mixin providing common methods for join-like operations that involve left and right inputs. |
PivotInput |
Defines the settings for a pivot (long-to-wide) operation. |
PolarsCodeInput |
A simple container for a string of user-provided Polars code to be executed. |
RecordIdInput |
Defines settings for adding a record ID (row number) column to the data. |
SelectInput |
Defines how a single column should be selected, renamed, or type-cast. |
SelectInputs |
A container for a list of |
SortByInput |
Defines a single sort condition on a column, including the direction. |
TextToRowsInput |
Defines settings for splitting a text column into multiple rows based on a delimiter. |
UnionInput |
Defines settings for a union (concatenation) operation. |
UniqueInput |
Defines settings for a uniqueness operation, specifying columns and which row to keep. |
UnpivotInput |
Defines settings for an unpivot (wide-to-long) operation. |
Functions:
Name | Description |
---|---|
construct_join_key_name |
Creates a temporary, unique name for a join key column. |
get_func_type_mapping |
Infers the output data type of common aggregation functions. |
string_concat |
A simple wrapper to concatenate string columns in Polars. |
AggColl
dataclass
A data class that represents a single aggregation operation for a group by operation.
Attributes
old_name : str The name of the column in the original DataFrame to be aggregated.
Any
The aggregation function to use. This can be a string representing a built-in function or a custom function.
Optional[str]
The name of the resulting aggregated column in the output DataFrame. If not provided, it will default to the old_name appended with the aggregation function.
Optional[str]
The type of the output values of the aggregation. If not provided, it is inferred from the aggregation function
using the get_func_type_mapping
function.
Example
agg_col = AggColl( old_name='col1', agg='sum', new_name='sum_col1', output_type='float' )
Methods:
Name | Description |
---|---|
__init__ |
Initializes an aggregation column with its source, function, and new name. |
Attributes:
Name | Type | Description |
---|---|---|
agg_func |
Returns the corresponding Polars aggregation function from the |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 |
|
agg_func
property
Returns the corresponding Polars aggregation function from the agg
string.
__init__(old_name, agg, new_name=None, output_type=None)
Initializes an aggregation column with its source, function, and new name.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
531 532 533 534 535 536 537 538 539 |
|
BasicFilter
dataclass
Defines a simple, single-condition filter (e.g., 'column' 'equals' 'value').
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
121 122 123 124 125 126 |
|
CrossJoinInput
dataclass
Bases: JoinSelectMixin
Defines the settings for a cross join operation, including column selections for both inputs.
Methods:
Name | Description |
---|---|
__init__ |
Initializes the CrossJoinInput with selections for left and right tables. |
auto_rename |
Automatically renames columns on the right side to prevent naming conflicts. |
Attributes:
Name | Type | Description |
---|---|---|
overlapping_records |
Finds column names that would conflict after the join. |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 |
|
overlapping_records
property
Finds column names that would conflict after the join.
__init__(left_select, right_select)
Initializes the CrossJoinInput with selections for left and right tables.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
296 297 298 299 300 |
|
auto_rename()
Automatically renames columns on the right side to prevent naming conflicts.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
307 308 309 310 311 312 313 314 |
|
FieldInput
dataclass
Represents a single field with its name and data type, typically for defining an output column.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
103 104 105 106 107 108 109 110 111 |
|
FilterInput
dataclass
Defines the settings for a filter operation, supporting basic or advanced (expression-based) modes.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
129 130 131 132 133 134 |
|
FullJoinKeyResponse
Bases: NamedTuple
Holds the join key rename responses for both sides of a join.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
49 50 51 52 |
|
FunctionInput
dataclass
Defines a formula to be applied, including the output field information.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
114 115 116 117 118 |
|
FuzzyMap
dataclass
Bases: JoinMap
Extends JoinMap
with settings for fuzzy string matching, such as the algorithm and similarity threshold.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
|
FuzzyMatchInput
dataclass
Bases: JoinInput
Extends JoinInput
with settings specific to fuzzy matching, such as the matching algorithm and threshold.
Attributes:
Name | Type | Description |
---|---|---|
fuzzy_maps |
List[FuzzyMap]
|
Returns the final fuzzy mappings after applying all column renames. |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
fuzzy_maps
property
Returns the final fuzzy mappings after applying all column renames.
GraphSolverInput
dataclass
Defines settings for a graph-solving operation (e.g., finding connected components).
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
676 677 678 679 680 681 |
|
GroupByInput
dataclass
A data class that represents the input for a group by operation.
Attributes
group_columns : List[str] A list of column names to group the DataFrame by. These column(s) will be set as the DataFrame index.
List[AggColl]
A list of AggColl
objects that specify the aggregation operations to perform on the DataFrame columns
after grouping. Each AggColl
object should specify the column to be aggregated and the aggregation
function to use.
Example
group_by_input = GroupByInput( agg_cols=[AggColl(old_name='ix', agg='groupby'), AggColl(old_name='groups', agg='groupby'), AggColl(old_name='col1', agg='sum'), AggColl(old_name='col2', agg='mean')] )
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 |
|
JoinInput
dataclass
Bases: JoinSelectMixin
Defines the settings for a standard SQL-style join, including keys, strategy, and selections.
Methods:
Name | Description |
---|---|
__init__ |
Initializes the JoinInput with keys, selections, and join strategy. |
auto_rename |
Automatically renames columns on the right side to prevent naming conflicts. |
get_join_key_renames |
Gets the temporary rename mappings for the join keys on both sides. |
parse_join_mapping |
Parses various input formats for join keys into a standardized list of |
set_join_keys |
Marks the |
Attributes:
Name | Type | Description |
---|---|---|
left_join_keys |
List[str]
|
Returns an ordered list of the left-side join key column names to be used in the join. |
right_join_keys |
List[str]
|
Returns an ordered list of the right-side join key column names to be used in the join. |
used_join_mapping |
List[JoinMap]
|
Returns the final join mapping after applying all renames and transformations. |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 |
|
left_join_keys
property
Returns an ordered list of the left-side join key column names to be used in the join.
right_join_keys
property
Returns an ordered list of the right-side join key column names to be used in the join.
used_join_mapping
property
Returns the final join mapping after applying all renames and transformations.
__init__(join_mapping, left_select, right_select, how='inner')
Initializes the JoinInput with keys, selections, and join strategy.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
346 347 348 349 350 351 352 353 354 355 |
|
auto_rename()
Automatically renames columns on the right side to prevent naming conflicts.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
404 405 406 407 408 409 410 411 412 |
|
get_join_key_renames(filter_drop=False)
Gets the temporary rename mappings for the join keys on both sides.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
362 363 364 365 |
|
parse_join_mapping(join_mapping)
staticmethod
Parses various input formats for join keys into a standardized list of JoinMap
objects.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
|
set_join_keys()
Marks the SelectInput
objects corresponding to join keys.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
357 358 359 360 |
|
JoinInputs
Bases: SelectInputs
Extends SelectInputs
with functionality specific to join operations, like handling join keys.
Methods:
Name | Description |
---|---|
get_join_key_rename_mapping |
Returns a dictionary mapping original join key names to their temporary names. |
get_join_key_renames |
Gets the temporary rename mapping for all join keys on one side of a join. |
Attributes:
Name | Type | Description |
---|---|---|
join_key_selects |
List[SelectInput]
|
Returns only the |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
join_key_selects
property
Returns only the SelectInput
objects that are marked as join keys.
get_join_key_rename_mapping(side)
Returns a dictionary mapping original join key names to their temporary names.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
216 217 218 |
|
get_join_key_renames(side, filter_drop=False)
Gets the temporary rename mapping for all join keys on one side of a join.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
207 208 209 210 211 212 213 214 |
|
JoinKeyRename
Bases: NamedTuple
Represents the renaming of a join key from its original to a temporary name.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
37 38 39 40 |
|
JoinKeyRenameResponse
Bases: NamedTuple
Contains a list of join key renames for one side of a join.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
43 44 45 46 |
|
JoinMap
dataclass
Defines a single mapping between a left and right column for a join key.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
221 222 223 224 225 |
|
JoinSelectMixin
A mixin providing common methods for join-like operations that involve left and right inputs.
Methods:
Name | Description |
---|---|
add_new_select_column |
Adds a new column to the selection for either the left or right side. |
auto_generate_new_col_name |
Generates a new, non-conflicting column name by adding a suffix if necessary. |
parse_select |
Parses various input formats into a standardized |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
|
add_new_select_column(select_input, side)
Adds a new column to the selection for either the left or right side.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
283 284 285 286 287 |
|
auto_generate_new_col_name(old_col_name, side)
Generates a new, non-conflicting column name by adding a suffix if necessary.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
273 274 275 276 277 278 279 280 281 |
|
parse_select(select)
staticmethod
Parses various input formats into a standardized JoinInputs
object.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
PivotInput
dataclass
Defines the settings for a pivot (long-to-wide) operation.
Methods:
Name | Description |
---|---|
get_group_by_input |
Constructs the |
get_pivot_column |
Returns the pivot column as a Polars column expression. |
get_values_expr |
Creates the struct expression used to gather the values for pivoting. |
Attributes:
Name | Type | Description |
---|---|---|
grouped_columns |
List[str]
|
Returns the list of columns to be used for the initial grouping stage of the pivot. |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 |
|
grouped_columns
property
Returns the list of columns to be used for the initial grouping stage of the pivot.
get_group_by_input()
Constructs the GroupByInput
needed for the pre-aggregation step of the pivot.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
589 590 591 592 593 |
|
get_pivot_column()
Returns the pivot column as a Polars column expression.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
598 599 600 |
|
get_values_expr()
Creates the struct expression used to gather the values for pivoting.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
602 603 604 |
|
PolarsCodeInput
dataclass
A simple container for a string of user-provided Polars code to be executed.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
684 685 686 687 |
|
RecordIdInput
dataclass
Defines settings for adding a record ID (row number) column to the data.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
614 615 616 617 618 619 620 |
|
SelectInput
dataclass
Defines how a single column should be selected, renamed, or type-cast.
This is a core building block for any operation that involves column manipulation. It holds all the configuration for a single field in a selection operation.
Attributes:
Name | Type | Description |
---|---|---|
polars_type |
str
|
Translates a user-friendly type name to a Polars data type string. |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
polars_type
property
Translates a user-friendly type name to a Polars data type string.
SelectInputs
dataclass
A container for a list of SelectInput
objects, providing helper methods for managing selections.
Methods:
Name | Description |
---|---|
__add__ |
Allows adding a SelectInput using the '+' operator. |
append |
Appends a new SelectInput to the list of renames. |
create_from_list |
Creates a SelectInputs object from a simple list of column names. |
create_from_pl_df |
Creates a SelectInputs object from a Polars DataFrame's columns. |
get_select_cols |
Gets a list of original column names to select from the source DataFrame. |
remove_select_input |
Removes a SelectInput from the list based on its original name. |
unselect_field |
Marks a field to be dropped from the final selection by setting |
Attributes:
Name | Type | Description |
---|---|---|
new_cols |
Set
|
Returns a set of new (renamed) column names to be kept in the selection. |
old_cols |
Set
|
Returns a set of original column names to be kept in the selection. |
rename_table |
Generates a dictionary for use in Polars' |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
|
new_cols
property
Returns a set of new (renamed) column names to be kept in the selection.
old_cols
property
Returns a set of original column names to be kept in the selection.
rename_table
property
Generates a dictionary for use in Polars' .rename()
method.
__add__(other)
Allows adding a SelectInput using the '+' operator.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
161 162 163 |
|
append(other)
Appends a new SelectInput to the list of renames.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
165 166 167 |
|
create_from_list(col_list)
classmethod
Creates a SelectInputs object from a simple list of column names.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
179 180 181 182 |
|
create_from_pl_df(df)
classmethod
Creates a SelectInputs object from a Polars DataFrame's columns.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
184 185 186 187 |
|
get_select_cols(include_join_key=True)
Gets a list of original column names to select from the source DataFrame.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
157 158 159 |
|
remove_select_input(old_key)
Removes a SelectInput from the list based on its original name.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
169 170 171 |
|
unselect_field(old_key)
Marks a field to be dropped from the final selection by setting keep
to False.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
173 174 175 176 177 |
|
SortByInput
dataclass
Defines a single sort condition on a column, including the direction.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
607 608 609 610 611 |
|
TextToRowsInput
dataclass
Defines settings for splitting a text column into multiple rows based on a delimiter.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
623 624 625 626 627 628 629 630 |
|
UnionInput
dataclass
Defines settings for a union (concatenation) operation.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
663 664 665 666 |
|
UniqueInput
dataclass
Defines settings for a uniqueness operation, specifying columns and which row to keep.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
669 670 671 672 673 |
|
UnpivotInput
dataclass
Defines settings for an unpivot (wide-to-long) operation.
Methods:
Name | Description |
---|---|
__post_init__ |
Ensures that list attributes are initialized correctly if they are None. |
Attributes:
Name | Type | Description |
---|---|---|
data_type_selector_expr |
Optional[Callable]
|
Returns a Polars selector function based on the |
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 |
|
data_type_selector_expr
property
Returns a Polars selector function based on the data_type_selector
string.
__post_init__()
Ensures that list attributes are initialized correctly if they are None.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
641 642 643 644 645 646 647 648 |
|
construct_join_key_name(side, column_name)
Creates a temporary, unique name for a join key column.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
32 33 34 |
|
get_func_type_mapping(func)
Infers the output data type of common aggregation functions.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
10 11 12 13 14 15 16 17 18 19 |
|
string_concat(*column)
A simple wrapper to concatenate string columns in Polars.
Source code in flowfile_core/flowfile_core/schemas/transform_schema.py
22 23 24 |
|
cloud_storage_schemas
flowfile_core.schemas.cloud_storage_schemas
Cloud storage connection schemas for S3, ADLS, and other cloud providers.
Classes:
Name | Description |
---|---|
AuthSettingsInput |
The information needed for the user to provide the details that are needed to provide how to connect to the |
CloudStorageReadSettings |
Settings for reading from cloud storage |
CloudStorageSettings |
Settings for cloud storage nodes in the visual designer |
CloudStorageWriteSettings |
Settings for writing to cloud storage |
CloudStorageWriteSettingsWorkerInterface |
Settings for writing to cloud storage in worker context |
FullCloudStorageConnection |
Internal model with decrypted secrets |
FullCloudStorageConnectionInterface |
API response model - no secrets exposed |
FullCloudStorageConnectionWorkerInterface |
Internal model with decrypted secrets |
WriteSettingsWorkerInterface |
Settings for writing to cloud storage |
Functions:
Name | Description |
---|---|
encrypt_for_worker |
Encrypts a secret value for use in worker contexts. |
get_cloud_storage_write_settings_worker_interface |
Convert to a worker interface model with hashed secrets. |
AuthSettingsInput
pydantic-model
Bases: BaseModel
The information needed for the user to provide the details that are needed to provide how to connect to the Cloud provider
Show JSON schema:
{
"description": "The information needed for the user to provide the details that are needed to provide how to connect to the\n Cloud provider",
"properties": {
"storage_type": {
"enum": [
"s3",
"adls",
"gcs"
],
"title": "Storage Type",
"type": "string"
},
"auth_method": {
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Method",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "None",
"title": "Connection Name"
}
},
"required": [
"storage_type",
"auth_method"
],
"title": "AuthSettingsInput",
"type": "object"
}
Fields:
-
storage_type
(CloudStorageType
) -
auth_method
(AuthMethod
) -
connection_name
(Optional[str]
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
25 26 27 28 29 30 31 32 |
|
CloudStorageReadSettings
pydantic-model
Bases: CloudStorageSettings
Settings for reading from cloud storage
Show JSON schema:
{
"description": "Settings for reading from cloud storage",
"properties": {
"auth_mode": {
"default": "auto",
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Mode",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Connection Name"
},
"resource_path": {
"title": "Resource Path",
"type": "string"
},
"scan_mode": {
"default": "single_file",
"enum": [
"single_file",
"directory"
],
"title": "Scan Mode",
"type": "string"
},
"file_format": {
"default": "parquet",
"enum": [
"csv",
"parquet",
"json",
"delta",
"iceberg"
],
"title": "File Format",
"type": "string"
},
"csv_has_header": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Csv Has Header"
},
"csv_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": ",",
"title": "Csv Delimiter"
},
"csv_encoding": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "utf8",
"title": "Csv Encoding"
},
"delta_version": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Delta Version"
}
},
"required": [
"resource_path"
],
"title": "CloudStorageReadSettings",
"type": "object"
}
Fields:
-
auth_mode
(AuthMethod
) -
connection_name
(Optional[str]
) -
resource_path
(str
) -
scan_mode
(Literal['single_file', 'directory']
) -
file_format
(Literal['csv', 'parquet', 'json', 'delta', 'iceberg']
) -
csv_has_header
(Optional[bool]
) -
csv_delimiter
(Optional[str]
) -
csv_encoding
(Optional[str]
) -
delta_version
(Optional[int]
)
Validators:
-
validate_auth_requirements
→auth_mode
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
134 135 136 137 138 139 140 141 142 143 144 |
|
CloudStorageSettings
pydantic-model
Bases: BaseModel
Settings for cloud storage nodes in the visual designer
Show JSON schema:
{
"description": "Settings for cloud storage nodes in the visual designer",
"properties": {
"auth_mode": {
"default": "auto",
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Mode",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Connection Name"
},
"resource_path": {
"title": "Resource Path",
"type": "string"
}
},
"required": [
"resource_path"
],
"title": "CloudStorageSettings",
"type": "object"
}
Fields:
-
auth_mode
(AuthMethod
) -
connection_name
(Optional[str]
) -
resource_path
(str
)
Validators:
-
validate_auth_requirements
→auth_mode
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
CloudStorageWriteSettings
pydantic-model
Bases: CloudStorageSettings
, WriteSettingsWorkerInterface
Settings for writing to cloud storage
Show JSON schema:
{
"description": "Settings for writing to cloud storage",
"properties": {
"resource_path": {
"title": "Resource Path",
"type": "string"
},
"write_mode": {
"default": "overwrite",
"enum": [
"overwrite",
"append"
],
"title": "Write Mode",
"type": "string"
},
"file_format": {
"default": "parquet",
"enum": [
"csv",
"parquet",
"json",
"delta"
],
"title": "File Format",
"type": "string"
},
"parquet_compression": {
"default": "snappy",
"enum": [
"snappy",
"gzip",
"brotli",
"lz4",
"zstd"
],
"title": "Parquet Compression",
"type": "string"
},
"csv_delimiter": {
"default": ",",
"title": "Csv Delimiter",
"type": "string"
},
"csv_encoding": {
"default": "utf8",
"title": "Csv Encoding",
"type": "string"
},
"auth_mode": {
"default": "auto",
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Mode",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Connection Name"
}
},
"required": [
"resource_path"
],
"title": "CloudStorageWriteSettings",
"type": "object"
}
Fields:
-
resource_path
(str
) -
write_mode
(Literal['overwrite', 'append']
) -
file_format
(Literal['csv', 'parquet', 'json', 'delta']
) -
parquet_compression
(Literal['snappy', 'gzip', 'brotli', 'lz4', 'zstd']
) -
csv_delimiter
(str
) -
csv_encoding
(str
) -
auth_mode
(AuthMethod
) -
connection_name
(Optional[str]
)
Validators:
-
validate_auth_requirements
→auth_mode
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
get_write_setting_worker_interface()
Convert to a worker interface model without secrets.
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
169 170 171 172 173 174 175 176 177 178 179 180 |
|
CloudStorageWriteSettingsWorkerInterface
pydantic-model
Bases: BaseModel
Settings for writing to cloud storage in worker context
Show JSON schema:
{
"$defs": {
"FullCloudStorageConnectionWorkerInterface": {
"description": "Internal model with decrypted secrets",
"properties": {
"storage_type": {
"enum": [
"s3",
"adls",
"gcs"
],
"title": "Storage Type",
"type": "string"
},
"auth_method": {
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Method",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "None",
"title": "Connection Name"
},
"aws_region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Region"
},
"aws_access_key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Access Key Id"
},
"aws_secret_access_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Secret Access Key"
},
"aws_role_arn": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Role Arn"
},
"aws_allow_unsafe_html": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Allow Unsafe Html"
},
"aws_session_token": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Session Token"
},
"azure_account_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Name"
},
"azure_account_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Key"
},
"azure_tenant_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Tenant Id"
},
"azure_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Id"
},
"azure_client_secret": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Secret"
},
"endpoint_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Endpoint Url"
},
"verify_ssl": {
"default": true,
"title": "Verify Ssl",
"type": "boolean"
}
},
"required": [
"storage_type",
"auth_method"
],
"title": "FullCloudStorageConnectionWorkerInterface",
"type": "object"
},
"WriteSettingsWorkerInterface": {
"description": "Settings for writing to cloud storage",
"properties": {
"resource_path": {
"title": "Resource Path",
"type": "string"
},
"write_mode": {
"default": "overwrite",
"enum": [
"overwrite",
"append"
],
"title": "Write Mode",
"type": "string"
},
"file_format": {
"default": "parquet",
"enum": [
"csv",
"parquet",
"json",
"delta"
],
"title": "File Format",
"type": "string"
},
"parquet_compression": {
"default": "snappy",
"enum": [
"snappy",
"gzip",
"brotli",
"lz4",
"zstd"
],
"title": "Parquet Compression",
"type": "string"
},
"csv_delimiter": {
"default": ",",
"title": "Csv Delimiter",
"type": "string"
},
"csv_encoding": {
"default": "utf8",
"title": "Csv Encoding",
"type": "string"
}
},
"required": [
"resource_path"
],
"title": "WriteSettingsWorkerInterface",
"type": "object"
}
},
"description": "Settings for writing to cloud storage in worker context",
"properties": {
"operation": {
"title": "Operation",
"type": "string"
},
"write_settings": {
"$ref": "#/$defs/WriteSettingsWorkerInterface"
},
"connection": {
"$ref": "#/$defs/FullCloudStorageConnectionWorkerInterface"
},
"flowfile_flow_id": {
"default": 1,
"title": "Flowfile Flow Id",
"type": "integer"
},
"flowfile_node_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string"
}
],
"default": -1,
"title": "Flowfile Node Id"
}
},
"required": [
"operation",
"write_settings",
"connection"
],
"title": "CloudStorageWriteSettingsWorkerInterface",
"type": "object"
}
Fields:
-
operation
(str
) -
write_settings
(WriteSettingsWorkerInterface
) -
connection
(FullCloudStorageConnectionWorkerInterface
) -
flowfile_flow_id
(int
) -
flowfile_node_id
(int | str
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
188 189 190 191 192 193 194 |
|
FullCloudStorageConnection
pydantic-model
Bases: AuthSettingsInput
Internal model with decrypted secrets
Show JSON schema:
{
"description": "Internal model with decrypted secrets",
"properties": {
"storage_type": {
"enum": [
"s3",
"adls",
"gcs"
],
"title": "Storage Type",
"type": "string"
},
"auth_method": {
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Method",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "None",
"title": "Connection Name"
},
"aws_region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Region"
},
"aws_access_key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Access Key Id"
},
"aws_secret_access_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Secret Access Key"
},
"aws_role_arn": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Role Arn"
},
"aws_allow_unsafe_html": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Allow Unsafe Html"
},
"aws_session_token": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Session Token"
},
"azure_account_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Name"
},
"azure_account_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Key"
},
"azure_tenant_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Tenant Id"
},
"azure_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Id"
},
"azure_client_secret": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Secret"
},
"endpoint_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Endpoint Url"
},
"verify_ssl": {
"default": true,
"title": "Verify Ssl",
"type": "boolean"
}
},
"required": [
"storage_type",
"auth_method"
],
"title": "FullCloudStorageConnection",
"type": "object"
}
Fields:
-
storage_type
(CloudStorageType
) -
auth_method
(AuthMethod
) -
connection_name
(Optional[str]
) -
aws_region
(Optional[str]
) -
aws_access_key_id
(Optional[str]
) -
aws_secret_access_key
(Optional[SecretStr]
) -
aws_role_arn
(Optional[str]
) -
aws_allow_unsafe_html
(Optional[bool]
) -
aws_session_token
(Optional[SecretStr]
) -
azure_account_name
(Optional[str]
) -
azure_account_key
(Optional[SecretStr]
) -
azure_tenant_id
(Optional[str]
) -
azure_client_id
(Optional[str]
) -
azure_client_secret
(Optional[SecretStr]
) -
endpoint_url
(Optional[str]
) -
verify_ssl
(bool
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
get_worker_interface()
Convert to a public interface model without secrets.
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
FullCloudStorageConnectionInterface
pydantic-model
Bases: AuthSettingsInput
API response model - no secrets exposed
Show JSON schema:
{
"description": "API response model - no secrets exposed",
"properties": {
"storage_type": {
"enum": [
"s3",
"adls",
"gcs"
],
"title": "Storage Type",
"type": "string"
},
"auth_method": {
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Method",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "None",
"title": "Connection Name"
},
"aws_allow_unsafe_html": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Allow Unsafe Html"
},
"aws_region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Region"
},
"aws_access_key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Access Key Id"
},
"aws_role_arn": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Role Arn"
},
"azure_account_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Name"
},
"azure_tenant_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Tenant Id"
},
"azure_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Id"
},
"endpoint_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Endpoint Url"
},
"verify_ssl": {
"default": true,
"title": "Verify Ssl",
"type": "boolean"
}
},
"required": [
"storage_type",
"auth_method"
],
"title": "FullCloudStorageConnectionInterface",
"type": "object"
}
Fields:
-
storage_type
(CloudStorageType
) -
auth_method
(AuthMethod
) -
connection_name
(Optional[str]
) -
aws_allow_unsafe_html
(Optional[bool]
) -
aws_region
(Optional[str]
) -
aws_access_key_id
(Optional[str]
) -
aws_role_arn
(Optional[str]
) -
azure_account_name
(Optional[str]
) -
azure_tenant_id
(Optional[str]
) -
azure_client_id
(Optional[str]
) -
endpoint_url
(Optional[str]
) -
verify_ssl
(bool
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
FullCloudStorageConnectionWorkerInterface
pydantic-model
Bases: AuthSettingsInput
Internal model with decrypted secrets
Show JSON schema:
{
"description": "Internal model with decrypted secrets",
"properties": {
"storage_type": {
"enum": [
"s3",
"adls",
"gcs"
],
"title": "Storage Type",
"type": "string"
},
"auth_method": {
"enum": [
"access_key",
"iam_role",
"service_principal",
"managed_identity",
"sas_token",
"aws-cli",
"env_vars"
],
"title": "Auth Method",
"type": "string"
},
"connection_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "None",
"title": "Connection Name"
},
"aws_region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Region"
},
"aws_access_key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Access Key Id"
},
"aws_secret_access_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Secret Access Key"
},
"aws_role_arn": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Role Arn"
},
"aws_allow_unsafe_html": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Allow Unsafe Html"
},
"aws_session_token": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Aws Session Token"
},
"azure_account_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Name"
},
"azure_account_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Account Key"
},
"azure_tenant_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Tenant Id"
},
"azure_client_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Id"
},
"azure_client_secret": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Azure Client Secret"
},
"endpoint_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Endpoint Url"
},
"verify_ssl": {
"default": true,
"title": "Verify Ssl",
"type": "boolean"
}
},
"required": [
"storage_type",
"auth_method"
],
"title": "FullCloudStorageConnectionWorkerInterface",
"type": "object"
}
Fields:
-
storage_type
(CloudStorageType
) -
auth_method
(AuthMethod
) -
connection_name
(Optional[str]
) -
aws_region
(Optional[str]
) -
aws_access_key_id
(Optional[str]
) -
aws_secret_access_key
(Optional[str]
) -
aws_role_arn
(Optional[str]
) -
aws_allow_unsafe_html
(Optional[bool]
) -
aws_session_token
(Optional[str]
) -
azure_account_name
(Optional[str]
) -
azure_account_key
(Optional[str]
) -
azure_tenant_id
(Optional[str]
) -
azure_client_id
(Optional[str]
) -
azure_client_secret
(Optional[str]
) -
endpoint_url
(Optional[str]
) -
verify_ssl
(bool
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
WriteSettingsWorkerInterface
pydantic-model
Bases: BaseModel
Settings for writing to cloud storage
Show JSON schema:
{
"description": "Settings for writing to cloud storage",
"properties": {
"resource_path": {
"title": "Resource Path",
"type": "string"
},
"write_mode": {
"default": "overwrite",
"enum": [
"overwrite",
"append"
],
"title": "Write Mode",
"type": "string"
},
"file_format": {
"default": "parquet",
"enum": [
"csv",
"parquet",
"json",
"delta"
],
"title": "File Format",
"type": "string"
},
"parquet_compression": {
"default": "snappy",
"enum": [
"snappy",
"gzip",
"brotli",
"lz4",
"zstd"
],
"title": "Parquet Compression",
"type": "string"
},
"csv_delimiter": {
"default": ",",
"title": "Csv Delimiter",
"type": "string"
},
"csv_encoding": {
"default": "utf8",
"title": "Csv Encoding",
"type": "string"
}
},
"required": [
"resource_path"
],
"title": "WriteSettingsWorkerInterface",
"type": "object"
}
Fields:
-
resource_path
(str
) -
write_mode
(Literal['overwrite', 'append']
) -
file_format
(Literal['csv', 'parquet', 'json', 'delta']
) -
parquet_compression
(Literal['snappy', 'gzip', 'brotli', 'lz4', 'zstd']
) -
csv_delimiter
(str
) -
csv_encoding
(str
)
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
152 153 154 155 156 157 158 159 160 161 162 |
|
encrypt_for_worker(secret_value)
Encrypts a secret value for use in worker contexts. This is a placeholder function that simulates encryption. In practice, you would use a secure encryption method.
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
15 16 17 18 19 20 21 22 |
|
get_cloud_storage_write_settings_worker_interface(write_settings, connection, lf, flowfile_flow_id=1, flowfile_node_id=-1)
Convert to a worker interface model with hashed secrets.
Source code in flowfile_core/flowfile_core/schemas/cloud_storage_schemas.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
output_model
flowfile_core.schemas.output_model
Classes:
Name | Description |
---|---|
BaseItem |
A base model for any item in a file system, like a file or directory. |
ExpressionRef |
A reference to a single Polars expression, including its name and docstring. |
ExpressionsOverview |
Represents a categorized list of available Polars expressions. |
FileColumn |
Represents detailed schema and statistics for a single column (field). |
InstantFuncResult |
Represents the result of a function that is expected to execute instantly. |
ItemInfo |
Provides detailed information about a single item in an output directory. |
NodeData |
A comprehensive model holding the complete state and data for a single node. |
NodeResult |
Represents the execution result of a single node in a FlowGraph run. |
OutputDir |
Represents the contents of a single output directory. |
OutputFile |
Represents a single file in an output directory, extending BaseItem. |
OutputFiles |
Represents a collection of files, typically within a directory. |
OutputTree |
Represents a directory tree, including subdirectories. |
RunInformation |
Contains summary information about a complete FlowGraph execution. |
TableExample |
Represents a preview of a table, including schema and sample data. |
BaseItem
pydantic-model
Bases: BaseModel
A base model for any item in a file system, like a file or directory.
Show JSON schema:
{
"description": "A base model for any item in a file system, like a file or directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
}
},
"required": [
"name",
"path"
],
"title": "BaseItem",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
30 31 32 33 34 35 36 37 38 39 |
|
ExpressionRef
pydantic-model
Bases: BaseModel
A reference to a single Polars expression, including its name and docstring.
Show JSON schema:
{
"description": "A reference to a single Polars expression, including its name and docstring.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"doc": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Doc"
}
},
"required": [
"name",
"doc"
],
"title": "ExpressionRef",
"type": "object"
}
Fields:
-
name
(str
) -
doc
(Optional[str]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
116 117 118 119 |
|
ExpressionsOverview
pydantic-model
Bases: BaseModel
Represents a categorized list of available Polars expressions.
Show JSON schema:
{
"$defs": {
"ExpressionRef": {
"description": "A reference to a single Polars expression, including its name and docstring.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"doc": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Doc"
}
},
"required": [
"name",
"doc"
],
"title": "ExpressionRef",
"type": "object"
}
},
"description": "Represents a categorized list of available Polars expressions.",
"properties": {
"expression_type": {
"title": "Expression Type",
"type": "string"
},
"expressions": {
"items": {
"$ref": "#/$defs/ExpressionRef"
},
"title": "Expressions",
"type": "array"
}
},
"required": [
"expression_type",
"expressions"
],
"title": "ExpressionsOverview",
"type": "object"
}
Fields:
-
expression_type
(str
) -
expressions
(List[ExpressionRef]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
122 123 124 125 |
|
FileColumn
pydantic-model
Bases: BaseModel
Represents detailed schema and statistics for a single column (field).
Show JSON schema:
{
"description": "Represents detailed schema and statistics for a single column (field).",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"data_type": {
"title": "Data Type",
"type": "string"
},
"is_unique": {
"title": "Is Unique",
"type": "boolean"
},
"max_value": {
"title": "Max Value",
"type": "string"
},
"min_value": {
"title": "Min Value",
"type": "string"
},
"number_of_empty_values": {
"title": "Number Of Empty Values",
"type": "integer"
},
"number_of_filled_values": {
"title": "Number Of Filled Values",
"type": "integer"
},
"number_of_unique_values": {
"title": "Number Of Unique Values",
"type": "integer"
},
"size": {
"title": "Size",
"type": "integer"
}
},
"required": [
"name",
"data_type",
"is_unique",
"max_value",
"min_value",
"number_of_empty_values",
"number_of_filled_values",
"number_of_unique_values",
"size"
],
"title": "FileColumn",
"type": "object"
}
Fields:
-
name
(str
) -
data_type
(str
) -
is_unique
(bool
) -
max_value
(str
) -
min_value
(str
) -
number_of_empty_values
(int
) -
number_of_filled_values
(int
) -
number_of_unique_values
(int
) -
size
(int
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
42 43 44 45 46 47 48 49 50 51 52 |
|
InstantFuncResult
pydantic-model
Bases: BaseModel
Represents the result of a function that is expected to execute instantly.
Show JSON schema:
{
"description": "Represents the result of a function that is expected to execute instantly.",
"properties": {
"success": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Success"
},
"result": {
"title": "Result",
"type": "string"
}
},
"required": [
"result"
],
"title": "InstantFuncResult",
"type": "object"
}
Fields:
-
success
(Optional[bool]
) -
result
(str
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
128 129 130 131 |
|
ItemInfo
pydantic-model
Bases: OutputFile
Provides detailed information about a single item in an output directory.
Show JSON schema:
{
"description": "Provides detailed information about a single item in an output directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"ext": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Ext"
},
"mimetype": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Mimetype"
},
"id": {
"default": -1,
"title": "Id",
"type": "integer"
},
"type": {
"title": "Type",
"type": "string"
},
"analysis_file_available": {
"default": false,
"title": "Analysis File Available",
"type": "boolean"
},
"analysis_file_location": {
"default": null,
"title": "Analysis File Location",
"type": "string"
},
"analysis_file_error": {
"default": null,
"title": "Analysis File Error",
"type": "string"
}
},
"required": [
"name",
"path",
"type"
],
"title": "ItemInfo",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
) -
ext
(Optional[str]
) -
mimetype
(Optional[str]
) -
id
(int
) -
type
(str
) -
analysis_file_available
(bool
) -
analysis_file_location
(str
) -
analysis_file_error
(str
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
101 102 103 104 105 106 107 |
|
NodeData
pydantic-model
Bases: BaseModel
A comprehensive model holding the complete state and data for a single node.
This includes its input/output data previews, settings, and run status.
Show JSON schema:
{
"$defs": {
"FileColumn": {
"description": "Represents detailed schema and statistics for a single column (field).",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"data_type": {
"title": "Data Type",
"type": "string"
},
"is_unique": {
"title": "Is Unique",
"type": "boolean"
},
"max_value": {
"title": "Max Value",
"type": "string"
},
"min_value": {
"title": "Min Value",
"type": "string"
},
"number_of_empty_values": {
"title": "Number Of Empty Values",
"type": "integer"
},
"number_of_filled_values": {
"title": "Number Of Filled Values",
"type": "integer"
},
"number_of_unique_values": {
"title": "Number Of Unique Values",
"type": "integer"
},
"size": {
"title": "Size",
"type": "integer"
}
},
"required": [
"name",
"data_type",
"is_unique",
"max_value",
"min_value",
"number_of_empty_values",
"number_of_filled_values",
"number_of_unique_values",
"size"
],
"title": "FileColumn",
"type": "object"
},
"TableExample": {
"description": "Represents a preview of a table, including schema and sample data.",
"properties": {
"node_id": {
"title": "Node Id",
"type": "integer"
},
"number_of_records": {
"title": "Number Of Records",
"type": "integer"
},
"number_of_columns": {
"title": "Number Of Columns",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"table_schema": {
"items": {
"$ref": "#/$defs/FileColumn"
},
"title": "Table Schema",
"type": "array"
},
"columns": {
"items": {
"type": "string"
},
"title": "Columns",
"type": "array"
},
"data": {
"anyOf": [
{
"items": {
"type": "object"
},
"type": "array"
},
{
"type": "null"
}
],
"default": {},
"title": "Data"
}
},
"required": [
"node_id",
"number_of_records",
"number_of_columns",
"name",
"table_schema",
"columns"
],
"title": "TableExample",
"type": "object"
}
},
"description": "A comprehensive model holding the complete state and data for a single node.\n\nThis includes its input/output data previews, settings, and run status.",
"properties": {
"flow_id": {
"title": "Flow Id",
"type": "integer"
},
"node_id": {
"title": "Node Id",
"type": "integer"
},
"flow_type": {
"title": "Flow Type",
"type": "string"
},
"left_input": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"right_input": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"main_input": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"main_output": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"left_output": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"right_output": {
"anyOf": [
{
"$ref": "#/$defs/TableExample"
},
{
"type": "null"
}
],
"default": null
},
"has_run": {
"default": false,
"title": "Has Run",
"type": "boolean"
},
"is_cached": {
"default": false,
"title": "Is Cached",
"type": "boolean"
},
"setting_input": {
"default": null,
"title": "Setting Input"
}
},
"required": [
"flow_id",
"node_id",
"flow_type"
],
"title": "NodeData",
"type": "object"
}
Fields:
-
flow_id
(int
) -
node_id
(int
) -
flow_type
(str
) -
left_input
(Optional[TableExample]
) -
right_input
(Optional[TableExample]
) -
main_input
(Optional[TableExample]
) -
main_output
(Optional[TableExample]
) -
left_output
(Optional[TableExample]
) -
right_output
(Optional[TableExample]
) -
has_run
(bool
) -
is_cached
(bool
) -
setting_input
(Any
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
NodeResult
pydantic-model
Bases: BaseModel
Represents the execution result of a single node in a FlowGraph run.
Show JSON schema:
{
"description": "Represents the execution result of a single node in a FlowGraph run.",
"properties": {
"node_id": {
"title": "Node Id",
"type": "integer"
},
"node_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Node Name"
},
"start_timestamp": {
"title": "Start Timestamp",
"type": "number"
},
"end_timestamp": {
"default": 0,
"title": "End Timestamp",
"type": "number"
},
"success": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Success"
},
"error": {
"default": "",
"title": "Error",
"type": "string"
},
"run_time": {
"default": -1,
"title": "Run Time",
"type": "integer"
},
"is_running": {
"default": true,
"title": "Is Running",
"type": "boolean"
}
},
"required": [
"node_id"
],
"title": "NodeResult",
"type": "object"
}
Fields:
-
node_id
(int
) -
node_name
(Optional[str]
) -
start_timestamp
(float
) -
end_timestamp
(float
) -
success
(Optional[bool]
) -
error
(str
) -
run_time
(int
) -
is_running
(bool
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
7 8 9 10 11 12 13 14 15 16 |
|
OutputDir
pydantic-model
Bases: BaseItem
Represents the contents of a single output directory.
Show JSON schema:
{
"$defs": {
"ItemInfo": {
"description": "Provides detailed information about a single item in an output directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"ext": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Ext"
},
"mimetype": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Mimetype"
},
"id": {
"default": -1,
"title": "Id",
"type": "integer"
},
"type": {
"title": "Type",
"type": "string"
},
"analysis_file_available": {
"default": false,
"title": "Analysis File Available",
"type": "boolean"
},
"analysis_file_location": {
"default": null,
"title": "Analysis File Location",
"type": "string"
},
"analysis_file_error": {
"default": null,
"title": "Analysis File Error",
"type": "string"
}
},
"required": [
"name",
"path",
"type"
],
"title": "ItemInfo",
"type": "object"
}
},
"description": "Represents the contents of a single output directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"all_items": {
"items": {
"type": "string"
},
"title": "All Items",
"type": "array"
},
"items": {
"items": {
"$ref": "#/$defs/ItemInfo"
},
"title": "Items",
"type": "array"
}
},
"required": [
"name",
"path",
"all_items",
"items"
],
"title": "OutputDir",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
) -
all_items
(List[str]
) -
items
(List[ItemInfo]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
110 111 112 113 |
|
OutputFile
pydantic-model
Bases: BaseItem
Represents a single file in an output directory, extending BaseItem.
Show JSON schema:
{
"description": "Represents a single file in an output directory, extending BaseItem.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"ext": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Ext"
},
"mimetype": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Mimetype"
}
},
"required": [
"name",
"path"
],
"title": "OutputFile",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
) -
ext
(Optional[str]
) -
mimetype
(Optional[str]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
85 86 87 88 |
|
OutputFiles
pydantic-model
Bases: BaseItem
Represents a collection of files, typically within a directory.
Show JSON schema:
{
"$defs": {
"OutputFile": {
"description": "Represents a single file in an output directory, extending BaseItem.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"ext": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Ext"
},
"mimetype": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Mimetype"
}
},
"required": [
"name",
"path"
],
"title": "OutputFile",
"type": "object"
}
},
"description": "Represents a collection of files, typically within a directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"files": {
"items": {
"$ref": "#/$defs/OutputFile"
},
"title": "Files",
"type": "array"
}
},
"required": [
"name",
"path"
],
"title": "OutputFiles",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
) -
files
(List[OutputFile]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
91 92 93 |
|
OutputTree
pydantic-model
Bases: OutputFiles
Represents a directory tree, including subdirectories.
Show JSON schema:
{
"$defs": {
"OutputFile": {
"description": "Represents a single file in an output directory, extending BaseItem.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"ext": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Ext"
},
"mimetype": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Mimetype"
}
},
"required": [
"name",
"path"
],
"title": "OutputFile",
"type": "object"
},
"OutputFiles": {
"description": "Represents a collection of files, typically within a directory.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"files": {
"items": {
"$ref": "#/$defs/OutputFile"
},
"title": "Files",
"type": "array"
}
},
"required": [
"name",
"path"
],
"title": "OutputFiles",
"type": "object"
}
},
"description": "Represents a directory tree, including subdirectories.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
},
"size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Size"
},
"creation_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Creation Date"
},
"access_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Access Date"
},
"modification_date": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Modification Date"
},
"source_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Source Path"
},
"number_of_items": {
"default": -1,
"title": "Number Of Items",
"type": "integer"
},
"files": {
"items": {
"$ref": "#/$defs/OutputFile"
},
"title": "Files",
"type": "array"
},
"directories": {
"items": {
"$ref": "#/$defs/OutputFiles"
},
"title": "Directories",
"type": "array"
}
},
"required": [
"name",
"path"
],
"title": "OutputTree",
"type": "object"
}
Fields:
-
name
(str
) -
path
(str
) -
size
(Optional[int]
) -
creation_date
(Optional[datetime]
) -
access_date
(Optional[datetime]
) -
modification_date
(Optional[datetime]
) -
source_path
(Optional[str]
) -
number_of_items
(int
) -
files
(List[OutputFile]
) -
directories
(List[OutputFiles]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
96 97 98 |
|
RunInformation
pydantic-model
Bases: BaseModel
Contains summary information about a complete FlowGraph execution.
Show JSON schema:
{
"$defs": {
"NodeResult": {
"description": "Represents the execution result of a single node in a FlowGraph run.",
"properties": {
"node_id": {
"title": "Node Id",
"type": "integer"
},
"node_name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Node Name"
},
"start_timestamp": {
"title": "Start Timestamp",
"type": "number"
},
"end_timestamp": {
"default": 0,
"title": "End Timestamp",
"type": "number"
},
"success": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Success"
},
"error": {
"default": "",
"title": "Error",
"type": "string"
},
"run_time": {
"default": -1,
"title": "Run Time",
"type": "integer"
},
"is_running": {
"default": true,
"title": "Is Running",
"type": "boolean"
}
},
"required": [
"node_id"
],
"title": "NodeResult",
"type": "object"
}
},
"description": "Contains summary information about a complete FlowGraph execution.",
"properties": {
"flow_id": {
"title": "Flow Id",
"type": "integer"
},
"start_time": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"title": "Start Time"
},
"end_time": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "End Time"
},
"success": {
"title": "Success",
"type": "boolean"
},
"nodes_completed": {
"default": 0,
"title": "Nodes Completed",
"type": "integer"
},
"number_of_nodes": {
"default": 0,
"title": "Number Of Nodes",
"type": "integer"
},
"node_step_result": {
"items": {
"$ref": "#/$defs/NodeResult"
},
"title": "Node Step Result",
"type": "array"
}
},
"required": [
"flow_id",
"success",
"node_step_result"
],
"title": "RunInformation",
"type": "object"
}
Fields:
-
flow_id
(int
) -
start_time
(Optional[datetime]
) -
end_time
(Optional[datetime]
) -
success
(bool
) -
nodes_completed
(int
) -
number_of_nodes
(int
) -
node_step_result
(List[NodeResult]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
19 20 21 22 23 24 25 26 27 |
|
TableExample
pydantic-model
Bases: BaseModel
Represents a preview of a table, including schema and sample data.
Show JSON schema:
{
"$defs": {
"FileColumn": {
"description": "Represents detailed schema and statistics for a single column (field).",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"data_type": {
"title": "Data Type",
"type": "string"
},
"is_unique": {
"title": "Is Unique",
"type": "boolean"
},
"max_value": {
"title": "Max Value",
"type": "string"
},
"min_value": {
"title": "Min Value",
"type": "string"
},
"number_of_empty_values": {
"title": "Number Of Empty Values",
"type": "integer"
},
"number_of_filled_values": {
"title": "Number Of Filled Values",
"type": "integer"
},
"number_of_unique_values": {
"title": "Number Of Unique Values",
"type": "integer"
},
"size": {
"title": "Size",
"type": "integer"
}
},
"required": [
"name",
"data_type",
"is_unique",
"max_value",
"min_value",
"number_of_empty_values",
"number_of_filled_values",
"number_of_unique_values",
"size"
],
"title": "FileColumn",
"type": "object"
}
},
"description": "Represents a preview of a table, including schema and sample data.",
"properties": {
"node_id": {
"title": "Node Id",
"type": "integer"
},
"number_of_records": {
"title": "Number Of Records",
"type": "integer"
},
"number_of_columns": {
"title": "Number Of Columns",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"table_schema": {
"items": {
"$ref": "#/$defs/FileColumn"
},
"title": "Table Schema",
"type": "array"
},
"columns": {
"items": {
"type": "string"
},
"title": "Columns",
"type": "array"
},
"data": {
"anyOf": [
{
"items": {
"type": "object"
},
"type": "array"
},
{
"type": "null"
}
],
"default": {},
"title": "Data"
}
},
"required": [
"node_id",
"number_of_records",
"number_of_columns",
"name",
"table_schema",
"columns"
],
"title": "TableExample",
"type": "object"
}
Fields:
-
node_id
(int
) -
number_of_records
(int
) -
number_of_columns
(int
) -
name
(str
) -
table_schema
(List[FileColumn]
) -
columns
(List[str]
) -
data
(Optional[List[Dict]]
)
Source code in flowfile_core/flowfile_core/schemas/output_model.py
55 56 57 58 59 60 61 62 63 |
|
Web API
This section documents the FastAPI routes that expose flowfile-core
's functionality over HTTP.
routes
flowfile_core.routes.routes
Main API router and endpoint definitions for the Flowfile application.
This module sets up the FastAPI router, defines all the API endpoints for interacting with flows, nodes, files, and other core components of the application. It handles the logic for creating, reading, updating, and deleting these resources.
Functions:
Name | Description |
---|---|
add_generic_settings |
A generic endpoint to update the settings of any node. |
add_node |
Adds a new, unconfigured node (a "promise") to the flow graph. |
cancel_flow |
Cancels a currently running flow execution. |
close_flow |
Closes an active flow session. |
connect_node |
Creates a connection (edge) between two nodes in the flow graph. |
copy_node |
Copies an existing node's settings to a new node promise. |
create_db_connection |
Creates and securely stores a new database connection. |
create_directory |
Creates a new directory at the specified path. |
create_flow |
Creates a new, empty flow file at the specified path and registers a session for it. |
delete_db_connection |
Deletes a stored database connection. |
delete_node |
Deletes a node from the flow graph. |
delete_node_connection |
Deletes a connection (edge) between two nodes. |
get_active_flow_file_sessions |
Retrieves a list of all currently active flow sessions. |
get_current_directory_contents |
Gets the contents of the file explorer's current directory. |
get_current_files |
Gets the contents of the file explorer's current directory. |
get_current_path |
Returns the current absolute path of the file explorer. |
get_db_connections |
Retrieves all stored database connections for the current user (without passwords). |
get_description_node |
Retrieves the description text for a specific node. |
get_directory_contents |
Gets the contents of an arbitrary directory path. |
get_downstream_node_ids |
Gets a list of all node IDs that are downstream dependencies of a given node. |
get_excel_sheet_names |
Retrieves the sheet names from an Excel file. |
get_expression_doc |
Retrieves documentation for available Polars expressions. |
get_expressions |
Retrieves a list of all available Flowfile expression names. |
get_flow |
Retrieves the settings for a specific flow. |
get_flow_frontend_data |
Retrieves the data needed to render the flow graph in the frontend. |
get_flow_settings |
Retrieves the main settings for a flow. |
get_generated_code |
Generates and returns a Python script with Polars code representing the flow. |
get_graphic_walker_input |
Gets the data and configuration for the Graphic Walker data exploration tool. |
get_instant_function_result |
Executes a simple, instant function on a node's data and returns the result. |
get_list_of_saved_flows |
Scans a directory for saved flow files ( |
get_local_files |
Retrieves a list of files from a specified local directory. |
get_node |
Retrieves the complete state and data preview for a single node. |
get_node_list |
Retrieves the list of all available node types and their templates. |
get_node_model |
(Internal) Retrieves a node's Pydantic model from the input_schema module by its name. |
get_run_status |
Retrieves the run status information for a specific flow. |
get_table_example |
Retrieves a data preview (schema and sample rows) for a node's output. |
get_vue_flow_data |
Retrieves the flow data formatted for the Vue-based frontend. |
import_saved_flow |
Imports a flow from a saved |
navigate_into_directory |
Navigates the file explorer into a specified subdirectory. |
navigate_to_directory |
Navigates the file explorer to an absolute directory path. |
navigate_up |
Navigates the file explorer one directory level up. |
register_flow |
Registers a new flow session with the application. |
run_flow |
Executes a flow in a background task. |
save_flow |
Saves the current state of a flow to a |
update_description_node |
Updates the description text for a specific node. |
update_flow_settings |
Updates the main settings for a flow. |
upload_file |
Uploads a file to the server's 'uploads' directory. |
validate_db_settings |
Validates that a connection can be made to a database with the given settings. |
add_generic_settings(input_data, node_type, current_user=Depends(get_current_active_user))
A generic endpoint to update the settings of any node.
This endpoint dynamically determines the correct Pydantic model and update
function based on the node_type
parameter.
Source code in flowfile_core/flowfile_core/routes/routes.py
455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 |
|
add_node(flow_id, node_id, node_type, pos_x=0, pos_y=0)
Adds a new, unconfigured node (a "promise") to the flow graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_id
|
int
|
The ID of the flow to add the node to. |
required |
node_id
|
int
|
The client-generated ID for the new node. |
required |
node_type
|
str
|
The type of the node to add (e.g., 'filter', 'join'). |
required |
pos_x
|
int
|
The X coordinate for the node's position in the UI. |
0
|
pos_y
|
int
|
The Y coordinate for the node's position in the UI. |
0
|
Source code in flowfile_core/flowfile_core/routes/routes.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
cancel_flow(flow_id)
Cancels a currently running flow execution.
Source code in flowfile_core/flowfile_core/routes/routes.py
222 223 224 225 226 227 228 |
|
close_flow(flow_id)
Closes an active flow session.
Source code in flowfile_core/flowfile_core/routes/routes.py
449 450 451 452 |
|
connect_node(flow_id, node_connection)
Creates a connection (edge) between two nodes in the flow graph.
Source code in flowfile_core/flowfile_core/routes/routes.py
399 400 401 402 403 404 405 406 407 408 |
|
copy_node(node_id_to_copy_from, flow_id_to_copy_from, node_promise)
Copies an existing node's settings to a new node promise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_id_to_copy_from
|
int
|
The ID of the node to copy the settings from. |
required |
flow_id_to_copy_from
|
int
|
The ID of the flow containing the source node. |
required |
node_promise
|
NodePromise
|
A |
required |
Source code in flowfile_core/flowfile_core/routes/routes.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
|
create_db_connection(input_connection, current_user=Depends(get_current_active_user), db=Depends(get_db))
Creates and securely stores a new database connection.
Source code in flowfile_core/flowfile_core/routes/routes.py
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
create_directory(new_directory)
Creates a new directory at the specified path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_directory
|
NewDirectory
|
An |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Source code in flowfile_core/flowfile_core/routes/routes.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
create_flow(flow_path)
Creates a new, empty flow file at the specified path and registers a session for it.
Source code in flowfile_core/flowfile_core/routes/routes.py
441 442 443 444 445 446 |
|
delete_db_connection(connection_name, current_user=Depends(get_current_active_user), db=Depends(get_db))
Deletes a stored database connection.
Source code in flowfile_core/flowfile_core/routes/routes.py
376 377 378 379 380 381 382 383 384 385 386 387 |
|
delete_node(flow_id, node_id)
Deletes a node from the flow graph.
Source code in flowfile_core/flowfile_core/routes/routes.py
337 338 339 340 341 342 343 344 |
|
delete_node_connection(flow_id, node_connection=None)
Deletes a connection (edge) between two nodes.
Source code in flowfile_core/flowfile_core/routes/routes.py
347 348 349 350 351 352 353 354 355 356 |
|
get_active_flow_file_sessions()
async
Retrieves a list of all currently active flow sessions.
Source code in flowfile_core/flowfile_core/routes/routes.py
195 196 197 198 |
|
get_current_directory_contents(file_types=None, include_hidden=False)
async
Gets the contents of the file explorer's current directory.
Source code in flowfile_core/flowfile_core/routes/routes.py
159 160 161 162 |
|
get_current_files()
async
Gets the contents of the file explorer's current directory.
Source code in flowfile_core/flowfile_core/routes/routes.py
104 105 106 107 108 |
|
get_current_path()
async
Returns the current absolute path of the file explorer.
Source code in flowfile_core/flowfile_core/routes/routes.py
132 133 134 135 |
|
get_db_connections(db=Depends(get_db), current_user=Depends(get_current_active_user))
Retrieves all stored database connections for the current user (without passwords).
Source code in flowfile_core/flowfile_core/routes/routes.py
390 391 392 393 394 395 396 |
|
get_description_node(flow_id, node_id)
Retrieves the description text for a specific node.
Source code in flowfile_core/flowfile_core/routes/routes.py
528 529 530 531 532 533 534 535 536 537 |
|
get_directory_contents(directory, file_types=None, include_hidden=False)
async
Gets the contents of an arbitrary directory path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory
|
str
|
The absolute path to the directory. |
required |
file_types
|
List[str]
|
An optional list of file extensions to filter by. |
None
|
include_hidden
|
bool
|
If True, includes hidden files and directories. |
False
|
Returns:
Type | Description |
---|---|
List[FileInfo]
|
A list of |
Source code in flowfile_core/flowfile_core/routes/routes.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
get_downstream_node_ids(flow_id, node_id)
async
Gets a list of all node IDs that are downstream dependencies of a given node.
Source code in flowfile_core/flowfile_core/routes/routes.py
548 549 550 551 552 553 |
|
get_excel_sheet_names(path)
async
Retrieves the sheet names from an Excel file.
Source code in flowfile_core/flowfile_core/routes/routes.py
631 632 633 634 635 636 637 638 |
|
get_expression_doc()
Retrieves documentation for available Polars expressions.
Source code in flowfile_core/flowfile_core/routes/routes.py
411 412 413 414 |
|
get_expressions()
Retrieves a list of all available Flowfile expression names.
Source code in flowfile_core/flowfile_core/routes/routes.py
417 418 419 420 |
|
get_flow(flow_id)
Retrieves the settings for a specific flow.
Source code in flowfile_core/flowfile_core/routes/routes.py
423 424 425 426 427 428 |
|
get_flow_frontend_data(flow_id=1)
Retrieves the data needed to render the flow graph in the frontend.
Source code in flowfile_core/flowfile_core/routes/routes.py
572 573 574 575 576 577 578 |
|
get_flow_settings(flow_id=1)
Retrieves the main settings for a flow.
Source code in flowfile_core/flowfile_core/routes/routes.py
581 582 583 584 585 586 587 |
|
get_generated_code(flow_id)
Generates and returns a Python script with Polars code representing the flow.
Source code in flowfile_core/flowfile_core/routes/routes.py
431 432 433 434 435 436 437 438 |
|
get_graphic_walker_input(flow_id, node_id)
Gets the data and configuration for the Graphic Walker data exploration tool.
Source code in flowfile_core/flowfile_core/routes/routes.py
609 610 611 612 613 614 615 616 617 |
|
get_instant_function_result(flow_id, node_id, func_string)
async
Executes a simple, instant function on a node's data and returns the result.
Source code in flowfile_core/flowfile_core/routes/routes.py
620 621 622 623 624 625 626 627 628 |
|
get_list_of_saved_flows(path)
Scans a directory for saved flow files (.flowfile
).
Source code in flowfile_core/flowfile_core/routes/routes.py
491 492 493 494 495 496 497 |
|
get_local_files(directory)
async
Retrieves a list of files from a specified local directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory
|
str
|
The absolute path of the directory to scan. |
required |
Returns:
Type | Description |
---|---|
List[FileInfo]
|
A list of |
Raises:
Type | Description |
---|---|
HTTPException
|
404 if the directory does not exist. |
Source code in flowfile_core/flowfile_core/routes/routes.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
get_node(flow_id, node_id, get_data=False)
Retrieves the complete state and data preview for a single node.
Source code in flowfile_core/flowfile_core/routes/routes.py
505 506 507 508 509 510 511 512 513 514 |
|
get_node_list()
Retrieves the list of all available node types and their templates.
Source code in flowfile_core/flowfile_core/routes/routes.py
499 500 501 502 |
|
get_node_model(setting_name_ref)
(Internal) Retrieves a node's Pydantic model from the input_schema module by its name.
Source code in flowfile_core/flowfile_core/routes/routes.py
60 61 62 63 64 65 66 |
|
get_run_status(flow_id, response)
Retrieves the run status information for a specific flow.
Returns a 202 Accepted status while the flow is running, and 200 OK when finished.
Source code in flowfile_core/flowfile_core/routes/routes.py
231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
|
get_table_example(flow_id, node_id)
Retrieves a data preview (schema and sample rows) for a node's output.
Source code in flowfile_core/flowfile_core/routes/routes.py
540 541 542 543 544 545 |
|
get_vue_flow_data(flow_id)
Retrieves the flow data formatted for the Vue-based frontend.
Source code in flowfile_core/flowfile_core/routes/routes.py
599 600 601 602 603 604 605 606 |
|
import_saved_flow(flow_path)
Imports a flow from a saved .flowfile
and registers it as a new session.
Source code in flowfile_core/flowfile_core/routes/routes.py
556 557 558 559 560 561 562 |
|
navigate_into_directory(directory_name)
async
Navigates the file explorer into a specified subdirectory.
Source code in flowfile_core/flowfile_core/routes/routes.py
118 119 120 121 122 |
|
navigate_to_directory(directory_name)
async
Navigates the file explorer to an absolute directory path.
Source code in flowfile_core/flowfile_core/routes/routes.py
125 126 127 128 129 |
|
navigate_up()
async
Navigates the file explorer one directory level up.
Source code in flowfile_core/flowfile_core/routes/routes.py
111 112 113 114 115 |
|
register_flow(flow_data)
Registers a new flow session with the application.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_data
|
FlowSettings
|
The |
required |
Returns:
Type | Description |
---|---|
int
|
The ID of the newly registered flow. |
Source code in flowfile_core/flowfile_core/routes/routes.py
182 183 184 185 186 187 188 189 190 191 192 |
|
run_flow(flow_id, background_tasks)
async
Executes a flow in a background task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flow_id
|
int
|
The ID of the flow to execute. |
required |
background_tasks
|
BackgroundTasks
|
FastAPI's background task runner. |
required |
Returns:
Type | Description |
---|---|
JSONResponse
|
A JSON response indicating that the flow has started. |
Source code in flowfile_core/flowfile_core/routes/routes.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
|
save_flow(flow_id, flow_path=None)
Saves the current state of a flow to a .flowfile
.
Source code in flowfile_core/flowfile_core/routes/routes.py
565 566 567 568 569 |
|
update_description_node(flow_id, node_id, description=Body(...))
Updates the description text for a specific node.
Source code in flowfile_core/flowfile_core/routes/routes.py
517 518 519 520 521 522 523 524 525 |
|
update_flow_settings(flow_settings)
Updates the main settings for a flow.
Source code in flowfile_core/flowfile_core/routes/routes.py
590 591 592 593 594 595 596 |
|
upload_file(file=File(...))
async
Uploads a file to the server's 'uploads' directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
UploadFile
|
The file to be uploaded. |
File(...)
|
Returns:
Type | Description |
---|---|
JSONResponse
|
A JSON response containing the filename and the path where it was saved. |
Source code in flowfile_core/flowfile_core/routes/routes.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
validate_db_settings(database_settings, current_user=Depends(get_current_active_user))
async
Validates that a connection can be made to a database with the given settings.
Source code in flowfile_core/flowfile_core/routes/routes.py
641 642 643 644 645 646 647 648 649 650 651 652 653 |
|
auth
flowfile_core.routes.auth
cloud_connections
flowfile_core.routes.cloud_connections
Functions:
Name | Description |
---|---|
create_cloud_storage_connection |
Create a new cloud storage connection. |
delete_cloud_connection_with_connection_name |
Delete a cloud connection. |
get_cloud_connections |
Get all cloud storage connections for the current user. |
create_cloud_storage_connection(input_connection, current_user=Depends(get_current_active_user), db=Depends(get_db))
Create a new cloud storage connection. Parameters input_connection: FullCloudStorageConnection schema containing connection details current_user: User obtained from Depends(get_current_active_user) db: Session obtained from Depends(get_db) Returns Dict with a success message
Source code in flowfile_core/flowfile_core/routes/cloud_connections.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
delete_cloud_connection_with_connection_name(connection_name, current_user=Depends(get_current_active_user), db=Depends(get_db))
Delete a cloud connection.
Source code in flowfile_core/flowfile_core/routes/cloud_connections.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
get_cloud_connections(db=Depends(get_db), current_user=Depends(get_current_active_user))
Get all cloud storage connections for the current user. Parameters db: Session obtained from Depends(get_db) current_user: User obtained from Depends(get_current_active_user)
Returns List[FullCloudStorageConnectionInterface]
Source code in flowfile_core/flowfile_core/routes/cloud_connections.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
logs
flowfile_core.routes.logs
Functions:
Name | Description |
---|---|
add_log |
Adds a log message to the log file for a given flow_id. |
add_raw_log |
Adds a log message to the log file for a given flow_id. |
format_sse_message |
Format the data as a proper SSE message |
stream_logs |
Streams logs for a given flow_id using Server-Sent Events. |
add_log(flow_id, log_message)
async
Adds a log message to the log file for a given flow_id.
Source code in flowfile_core/flowfile_core/routes/logs.py
34 35 36 37 38 39 40 41 |
|
add_raw_log(raw_log_input)
async
Adds a log message to the log file for a given flow_id.
Source code in flowfile_core/flowfile_core/routes/logs.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
format_sse_message(data)
async
Format the data as a proper SSE message
Source code in flowfile_core/flowfile_core/routes/logs.py
29 30 31 |
|
stream_logs(flow_id, idle_timeout=300, current_user=Depends(get_current_user_from_query))
async
Streams logs for a given flow_id using Server-Sent Events. Requires authentication via token in query parameter. The connection will close gracefully if the server shuts down.
Source code in flowfile_core/flowfile_core/routes/logs.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
public
flowfile_core.routes.public
Functions:
Name | Description |
---|---|
docs_redirect |
Redirects to the documentation page. |
docs_redirect()
async
Redirects to the documentation page.
Source code in flowfile_core/flowfile_core/routes/public.py
8 9 10 11 |
|
secrets
flowfile_core.routes.secrets
Manages CRUD (Create, Read, Update, Delete) operations for secrets.
This router provides secure endpoints for creating, retrieving, and deleting sensitive credentials for the authenticated user. Secrets are encrypted before being stored and are associated with the user's ID.
Functions:
Name | Description |
---|---|
create_secret |
Creates a new secret for the authenticated user. |
delete_secret |
Deletes a secret by name for the authenticated user. |
get_secret |
Retrieves a specific secret by name for the authenticated user. |
get_secrets |
Retrieves all secret names for the currently authenticated user. |
create_secret(secret, current_user=Depends(get_current_active_user), db=Depends(get_db))
async
Creates a new secret for the authenticated user.
The secret value is encrypted before being stored in the database. A secret name must be unique for a given user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
secret
|
SecretInput
|
A |
required |
current_user
|
The authenticated user object, injected by FastAPI. |
Depends(get_current_active_user)
|
|
db
|
Session
|
The database session, injected by FastAPI. |
Depends(get_db)
|
Raises:
Type | Description |
---|---|
HTTPException
|
400 if a secret with the same name already exists for the user. |
Returns:
Type | Description |
---|---|
Secret
|
A |
Source code in flowfile_core/flowfile_core/routes/secrets.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
delete_secret(secret_name, current_user=Depends(get_current_active_user), db=Depends(get_db))
async
Deletes a secret by name for the authenticated user.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
secret_name
|
str
|
The name of the secret to delete. |
required |
current_user
|
The authenticated user object, injected by FastAPI. |
Depends(get_current_active_user)
|
|
db
|
Session
|
The database session, injected by FastAPI. |
Depends(get_db)
|
Returns:
Type | Description |
---|---|
None
|
An empty response with a 204 No Content status code upon success. |
Source code in flowfile_core/flowfile_core/routes/secrets.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
get_secret(secret_name, current_user=Depends(get_current_active_user), db=Depends(get_db))
async
Retrieves a specific secret by name for the authenticated user.
Note: This endpoint returns the secret name and metadata but does not expose the decrypted secret value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
secret_name
|
str
|
The name of the secret to retrieve. |
required |
current_user
|
The authenticated user object, injected by FastAPI. |
Depends(get_current_active_user)
|
|
db
|
Session
|
The database session, injected by FastAPI. |
Depends(get_db)
|
Raises:
Type | Description |
---|---|
HTTPException
|
404 if the secret is not found. |
Returns:
Type | Description |
---|---|
Secret
|
A |
Source code in flowfile_core/flowfile_core/routes/secrets.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
get_secrets(current_user=Depends(get_current_active_user), db=Depends(get_db))
async
Retrieves all secret names for the currently authenticated user.
Note: This endpoint returns the secret names and metadata but does not expose the decrypted secret values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current_user
|
The authenticated user object, injected by FastAPI. |
Depends(get_current_active_user)
|
|
db
|
Session
|
The database session, injected by FastAPI. |
Depends(get_db)
|
Returns:
Type | Description |
---|---|
A list of |
Source code in flowfile_core/flowfile_core/routes/secrets.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|