Batch Cortex Search in Snowflake: Fuzzy Matching Millions of Records in One SQL Statement

Celestinfo Software Solutions Pvt. Ltd. May 21, 2026

Quick answer: Batch Cortex Search is a Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It went generally available on May 18, 2026 and is purpose-built for entity resolution, customer identity matching, deduplication, and bulk enrichment workloads where you would otherwise loop over thousands of REST calls. Per Snowflake's documentation, jobs over roughly 2,000 queries see significantly higher throughput than the interactive API.

Last updated: May 2026

Why Batch Search Even Exists

Cortex Search, covered in our Cortex AI guide, has always been a great fit for one-at-a-time queries: a chat assistant looks up a single user's question and gets ranked results back in tens of milliseconds. That request/response shape is exactly what the interactive Python and REST surfaces are tuned for.

The problem starts when you flip the workload around. Picture a CRM cleanup job. You have 750,000 user-submitted addresses and a master address table. You want to find the closest match in master for each submission, scored, with a confidence threshold. Calling the interactive API 750,000 times burns through your network, your warehouse, your patience, and your wallet. People have been writing their own batching wrappers in Python for years.

Batch Cortex Search is Snowflake replacing that homegrown wrapper with a first-class SQL table function. You pass an input table, the engine fans out the queries against a dedicated batch compute pool, and you get a fully joined result set back.

What CORTEX_SEARCH_BATCH Actually Does

From the official documentation: "The CORTEX_SEARCH_BATCH function allows jobs submitted to a Cortex Search Service to leverage additional compute resources to provide significantly higher throughput than the interactive API search query surfaces."

In plain language, it is a lateral table function. You give it the name of a search service, a column of query strings, and optional filters, and it returns one ranked result row per input query (or per result if limit > 1).

The Function Signature

Snowflake's documentation defines the table function as follows:

SQL — function signature
CORTEX_SEARCH_BATCH(
    service_name => '<database>.<schema>.<cortex_search_service>',
    query => q.query,                   -- optional STRING
    multi_index_query => q.miq,         -- optional VARIANT
    filter => q.filter,                 -- optional VARIANT
    limit => 10,                        -- optional INT, default 10
    options => q.options                -- optional VARIANT
)

Parameter notes (from the Snowflake docs):

At least one of query, multi_index_query, or filter must be specified.

End-to-End Example: Matching User Submissions Against a Golden Catalog

The official docs show a clean example of product-catalog cleanup. Here it is, exactly as documented, so you can paste it into a Snowsight worksheet and follow along.

Step 1: Create the golden table you want to match against

SQL
CREATE OR REPLACE TABLE golden_catalog (product_name TEXT);
INSERT INTO golden_catalog VALUES
  ('Wireless Bluetooth Headphones'),
  ('Wireless Noise-Canceling Earbuds'),
  ('USB-C Charging Cable 6ft'),
  ('Portable Power Bank 10000mAh');

Step 2: Create the Cortex Search Service on top of it

SQL
CREATE CORTEX SEARCH SERVICE golden_product_service
ON product_name
WAREHOUSE = <warehouse_name>
TARGET_LAG = '1 day'
AS
SELECT product_name FROM golden_catalog;

The search service is responsible for indexing the column and refreshing the index according to TARGET_LAG. The batch function does not build the index itself; it queries this existing service.

Step 3: Create the input table with all the queries you want to run

SQL
CREATE OR REPLACE TABLE submitted_products (product TEXT);
INSERT INTO submitted_products VALUES
  ('bluetooth headphones wireless'),
  ('usb c cable');

Step 4: Run the batch search

SQL
SELECT
  q.product, s.*
FROM submitted_products AS q,
LATERAL CORTEX_SEARCH_BATCH(
    service_name => 'golden_product_service',
    query => q.product,
    limit => 2,
    options => OBJECT_CONSTRUCT(
        'scoring_config', OBJECT_CONSTRUCT(
            'disable_vector_embedding_query_prefix', true
        )
    )
) AS s;

That single statement runs both submitted queries against the search service and returns the top 2 matches for each, joined back to the original input. Swap the two-row input for a million-row table and you have a production-grade entity resolution pipeline.

When to Use Batch vs Interactive Search

Snowflake gives a useful rule of thumb in the documentation: if you have fewer than about 2,000 queries to run, "you'll typically get faster results using the interactive Cortex Search API (Python or REST API) rather than batch search." Batch has additional startup latency that is amortized over high-volume jobs.

Scenario Recommended surface
Chat assistant looking up a single user questionInteractive (Python / REST)
Sub-2,000 queries per jobInteractive (Python / REST)
Customer identity reconciliation across datasetsBatch (CORTEX_SEARCH_BATCH)
Product / address / company-name deduplicationBatch
Bulk enrichment of incoming data feedsBatch
Nightly clustering or relationship-discovery jobBatch

Tuning Throughput: The replicas Option

The options VARIANT accepts a replicas key. The default is 2. Higher values "complete jobs faster but don't increase total serving cost," per the docs. The serving cost is set by the index size and total job duration; replicas only change the parallelism, not the bill.

A reasonable approach: leave replicas at the default for the first run, look at the wall-clock time in QUERY_HISTORY, and only bump it if the latency is hurting downstream SLAs.

How Batch Cortex Search Is Billed

There are three distinct cost components, all spelled out in the docs:

  1. Serving cost — charged based on the size of the search index data and the duration of the batch search job.
  2. Query embedding cost — a charge for the number of tokens embedded as a result of the input queries. Interactive Cortex Search does not charge this separately, so it is worth tracking the input size of your batch job.
  3. Virtual warehouse cost — the warehouse compute used to run the batch job. Pick a warehouse sized for the orchestration overhead, not for the search itself. The serving cluster does the heavy lifting.

If you already model Cortex costs using our cost optimization guide, add a new line item for batch search and watch it for the first few weeks after rollout. We have seen teams underestimate the embedding cost on noisy text inputs.

Real-World Use Cases

Operational Tips From Early Adopters

How Batch Cortex Search Fits Into a Snowflake AI Stack

Cortex Search, Cortex Analyst, and the various AI functions are now overlapping enough that picking the right one matters. A short cheat sheet:


Key Takeaways


Ritish, Senior Data Engineer

Ritish helps CelestInfo clients design Snowflake platforms with a focus on AI workloads, large-scale data engineering, and operational reliability.

Related Articles

Frequently Asked Questions

What is Batch Cortex Search?

A Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It became generally available on May 18, 2026.

When should I use Batch Cortex Search instead of the interactive API?

Use it when you have roughly 2,000 or more queries per job. For smaller workloads the interactive Python or REST API is typically faster because batch search has additional startup latency.

Do I still need to create a Cortex Search Service?

Yes. CORTEX_SEARCH_BATCH queries an existing service. You first create it with CREATE CORTEX SEARCH SERVICE on the target table, warehouse, and TARGET_LAG.

What does Batch Cortex Search cost?

Three components per the docs: serving cost (index size and job duration), query embedding cost (tokens embedded across all input queries), and the virtual warehouse compute used to run the job.

Can I tune throughput?

Yes, via the replicas key inside options. Higher values finish jobs faster but do not increase total serving cost. The default is 2.

Burning Questions
About CelestInfo

Simple answers to make things clear.

A SQL table function (CORTEX_SEARCH_BATCH) that runs many fuzzy-match queries against a Cortex Search Service in one statement. GA since May 18, 2026.

Roughly 2,000 queries or more → batch. Anything smaller → the interactive Python or REST API, which avoids batch startup latency.

Yes. Create one with CREATE CORTEX SEARCH SERVICE first; CORTEX_SEARCH_BATCH queries it.

Serving cost (index + duration), query embedding cost (tokens embedded), and the virtual warehouse compute used to run the job.

Set replicas higher in options. Higher replicas finish jobs faster but do not raise serving cost. Default is 2.

Still have questions?

Get Assistance

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.

Get Assistance