Quick Start =========== This guide will help you get started with pl-fuzzy-frame-match in just a few minutes. Basic Example ------------- .. code-block:: python import polars as pl from pl_fuzzy_frame_match import fuzzy_match_dfs, FuzzyMapping import logging # Set up logger logger = logging.getLogger(__name__) # Create sample dataframes left_df = pl.DataFrame({ "company_name": ["Apple Inc", "Microsoft Corporation", "Google LLC"], "company_id": [1, 2, 3] }).lazy() right_df = pl.DataFrame({ "vendor_name": ["Apple", "Microsoft Corp", "Alphabet/Google"], "vendor_code": ["A001", "M001", "G001"] }).lazy() # Define fuzzy matching fuzzy_maps = [ FuzzyMapping( left_col="company_name", right_col="vendor_name", threshold_score=70.0, # 70% similarity fuzzy_type="jaro_winkler" ) ] # Perform matching result = fuzzy_match_dfs( left_df=left_df, right_df=right_df, fuzzy_maps=fuzzy_maps, logger=logger ) print(result) Understanding the Results ------------------------- The output dataframe will contain: * All columns from both input dataframes * A fuzzy score column (e.g., ``fuzzy_score_0``) with similarity scores between 0 and 1 * Only matches that meet or exceed your threshold score Available Algorithms -------------------- * **levenshtein**: Edit distance (insertions, deletions, substitutions) * **jaro**: Good for short strings * **jaro_winkler**: Enhanced Jaro, excellent for names * **hamming**: For equal-length strings * **damerau_levenshtein**: Includes transpositions * **indel**: Insertion/deletion distance only Choosing the Right Algorithm ---------------------------- * **Names**: Use ``jaro_winkler`` * **Addresses**: Use ``levenshtein`` * **Codes/IDs**: Use ``hamming`` (if same length) or ``levenshtein`` * **General text**: Use ``levenshtein`` or ``damerau_levenshtein`` Next Steps ---------- * See :doc:`examples` for more complex use cases * Check the :doc:`api` for detailed function documentation * Read about performance optimization for large datasets