Combine Nodes

Combine nodes allow you to merge multiple datasets in different ways, enabling data integration and enrichment. These nodes help in aligning, linking, and structuring data from various sources to create a unified dataset.

Depending on the method used, datasets can be merged by matching values, stacking rows, finding similar records, generating all possible combinations, or grouping related elements in a network.

These transformations are essential for tasks like data preparation, consolidation, and relationship mapping across datasets.

Node Details

Join

The Join node merges two datasets based on matching values in selected columns.

Key Features

Supports multiple join types: Inner, Left, Right, Outer
Join on one or more columns
Handles duplicate column names with automatic renaming

Usage

Connect two input datasets (left and right).
Select join type (inner, left, right, anti or outer).
Choose columns to join on.
Select which columns to keep from each dataset.

Configuration Options

Parameter	Description
Join Type	Choose `inner`, `left`, `right`, `anti` or `outer` join.
Join Columns	Columns used to match records between datasets.

This node is useful for merging related datasets, such as combining customer data with orders or linking product details with inventory.

Fuzzy Match

The Fuzzy Match node joins datasets based on similar values instead of exact matches, using various matching algorithms.

Key Features

Supports fuzzy matching algorithms (e.g., Levenshtein)
Configurable similarity threshold
Calculates match scores
Joins datasets based on approximate values

Usage

Connect two datasets (left and right).
Select columns to match on.
Choose a fuzzy matching algorithm.
Set a similarity threshold (e.g., 75%).

Configuration Options

Parameter	Description
Join Columns	Columns used for fuzzy matching.
Fuzzy Algorithm	Choose an algorithm (e.g., `Levenshtein`).
Threshold Score	Minimum similarity score for a match (0-100).

This node is useful for handling typos, name variations, and inconsistent formatting when merging datasets.

Union Data

The Union Data node merges multiple datasets by stacking rows together.

Key Features

Combines multiple datasets into one
Automatically aligns columns based on names
Uses diagonal relaxed mode, allowing flexible column matching

Usage

Connect multiple input datasets.
The node will automatically align and stack the data.

This node is useful for combining similar datasets, such as monthly reports or regional data.

Cross Join

The Cross Join node creates all possible combinations between two datasets.

Key Features

Generates a Cartesian product of two datasets
Automatically aligns columns
Handles duplicate column names

Usage

Connect two datasets (left and right).
Select the columns that you would like to keep and their output names
The node will generate all possible row combinations.

This node is useful for creating test scenarios, generating all possible product combinations, or building comparison matrices.

Graph Solver

The Graph Solver node groups related records based on connections in a graph-structured dataset.

Key Features

Identifies connected components in graph-like data
Groups related nodes into the same category
Supports custom output column names

Usage

Select From and To columns to define relationships.
The node assigns a group identifier to connected nodes.

Configuration Options

Parameter	Description
From Column	Defines the starting point of each connection.
To Column	Defines the endpoint of each connection.
Output Column	Stores the assigned group identifier.

This node is useful for detecting dependencies, clustering related entities, and analyzing network connections.