Combine Nodes
Combine nodes allow you to merge multiple datasets in different ways, enabling data integration and enrichment. These nodes help in aligning, linking, and structuring data from various sources to create a unified dataset.
Depending on the method used, datasets can be merged by matching values, stacking rows, finding similar records, generating all possible combinations, or grouping related elements in a network.
These transformations are essential for tasks like data preparation, consolidation, and relationship mapping across datasets.
Node Details
Join
The Join node merges two datasets based on matching values in selected columns.
Key Features
- Supports multiple join types: Inner, Left, Right, Outer
- Join on one or more columns
- Handles duplicate column names with automatic renaming
Usage
- Connect two input datasets (left and right).
- Select join type (
inner
,left
,right
,anti
orouter
). - Choose columns to join on.
- Select which columns to keep from each dataset.
Configuration Options
Parameter | Description |
---|---|
Join Type | Choose inner , left , right , anti or outer join. |
Join Columns | Columns used to match records between datasets. |
This node is useful for merging related datasets, such as combining customer data with orders or linking product details with inventory.
Fuzzy Match
The Fuzzy Match node joins datasets based on similar values instead of exact matches, using various matching algorithms.
Key Features
- Supports fuzzy matching algorithms (e.g., Levenshtein)
- Configurable similarity threshold
- Calculates match scores
- Joins datasets based on approximate values
Usage
- Connect two datasets (left and right).
- Select columns to match on.
- Choose a fuzzy matching algorithm.
- Set a similarity threshold (e.g., 75%).
Configuration Options
Parameter | Description |
---|---|
Join Columns | Columns used for fuzzy matching. |
Fuzzy Algorithm | Choose an algorithm (e.g., Levenshtein ). |
Threshold Score | Minimum similarity score for a match (0-100). |
This node is useful for handling typos, name variations, and inconsistent formatting when merging datasets.
Union Data
The Union Data node merges multiple datasets by stacking rows together.
Key Features
- Combines multiple datasets into one
- Automatically aligns columns based on names
- Uses diagonal relaxed mode, allowing flexible column matching
Usage
- Connect multiple input datasets.
- The node will automatically align and stack the data.
This node is useful for combining similar datasets, such as monthly reports or regional data.
Cross Join
The Cross Join node creates all possible combinations between two datasets.
Key Features
- Generates a Cartesian product of two datasets
- Automatically aligns columns
- Handles duplicate column names
Usage
- Connect two datasets (left and right).
- Select the columns that you would like to keep and their output names
- The node will generate all possible row combinations.
This node is useful for creating test scenarios, generating all possible product combinations, or building comparison matrices.
Graph Solver
The Graph Solver node groups related records based on connections in a graph-structured dataset.
Key Features
- Identifies connected components in graph-like data
- Groups related nodes into the same category
- Supports custom output column names
Usage
- Select From and To columns to define relationships.
- The node assigns a group identifier to connected nodes.
Configuration Options
Parameter | Description |
---|---|
From Column | Defines the starting point of each connection. |
To Column | Defines the endpoint of each connection. |
Output Column | Stores the assigned group identifier. |
This node is useful for detecting dependencies, clustering related entities, and analyzing network connections.