Batch Cortex Search in Snowflake: Fuzzy Matching Millions of Records in One SQL Statement

Q: What is Batch Cortex Search?

Batch Cortex Search is a Snowflake-managed table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single SQL statement. It provides significantly higher throughput than the interactive Cortex Search API by allocating dedicated compute to the batch job. It became generally available on May 18, 2026.

Q: When should I use Batch Cortex Search instead of the interactive API?

Use batch search when you have roughly 2,000 or more queries to run in one job, such as entity resolution, customer identity matching, deduplication, or bulk enrichment. For fewer queries, the interactive Python or REST API is typically faster because batch search has additional startup latency.

Q: Do I still need to create a Cortex Search Service?

Yes. CORTEX_SEARCH_BATCH queries an existing Cortex Search Service. You first create the service with CREATE CORTEX SEARCH SERVICE, pointing it at the target table, warehouse, and TARGET_LAG, and then call CORTEX_SEARCH_BATCH against it.

Q: What does Batch Cortex Search cost?

Three components: a serving cost based on the size of the search index and the duration of the batch job, a query embedding cost based on the number of tokens embedded across all input queries, and the virtual warehouse compute used to run the job.

Q: Can I tune throughput?

Yes, via the replicas option inside the options parameter. Higher replica counts finish jobs faster but do not increase total serving cost. The default value is 2.

Ritish

Published May 21, 2026 · Last updated May 2026

Celestinfo Software Solutions Pvt. Ltd. • May 21, 2026

Quick answer: Batch Cortex Search is a Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It went generally available on May 18, 2026 and is purpose-built for entity resolution, customer identity matching, deduplication, and bulk enrichment workloads where you would otherwise loop over thousands of REST calls. Per Snowflake's documentation, jobs over roughly 2,000 queries see significantly higher throughput than the interactive API.

Last updated: May 2026

Why Batch Search Even Exists

Cortex Search, covered in our Cortex AI guide, has always been a great fit for one-at-a-time queries: a chat assistant looks up a single user's question and gets ranked results back in tens of milliseconds. That request/response shape is exactly what the interactive Python and REST surfaces are tuned for.

The problem starts when you flip the workload around. Picture a CRM cleanup job. You have 750,000 user-submitted addresses and a master address table. You want to find the closest match in master for each submission, scored, with a confidence threshold. Calling the interactive API 750,000 times burns through your network, your warehouse, your patience, and your wallet. People have been writing their own batching wrappers in Python for years.

Batch Cortex Search is Snowflake replacing that homegrown wrapper with a first-class SQL table function. You pass an input table, the engine fans out the queries against a dedicated batch compute pool, and you get a fully joined result set back.

What CORTEX_SEARCH_BATCH Actually Does

From the official documentation: "The CORTEX_SEARCH_BATCH function allows jobs submitted to a Cortex Search Service to leverage additional compute resources to provide significantly higher throughput than the interactive API search query surfaces."

In plain language, it is a lateral table function. You give it the name of a search service, a column of query strings, and optional filters, and it returns one ranked result row per input query (or per result if limit > 1).

The Function Signature

Snowflake's documentation defines the table function as follows:

SQL — function signature

CORTEX_SEARCH_BATCH(
    service_name => '<database>.<schema>.<cortex_search_service>',
    query => q.query,                   -- optional STRING
    multi_index_query => q.miq,         -- optional VARIANT
    filter => q.filter,                 -- optional VARIANT
    limit => 10,                        -- optional INT, default 10
    options => q.options                -- optional VARIANT
)

Parameter notes (from the Snowflake docs):

service_name (required): fully-qualified name of the Cortex Search Service to query.
query (optional): column containing the query string for searching the service.
multi_index_query (optional): an object that specifies one or more vector or keyword query inputs, with a maximum of one vector index entry per query for performance.
filter (optional): column containing filter objects to apply to the search results.
limit (optional): maximum number of results to return per query. Default is 10.
options (optional): supports scoring_config and replicas (default 2).

At least one of query, multi_index_query, or filter must be specified.

End-to-End Example: Matching User Submissions Against a Golden Catalog

The official docs show a clean example of product-catalog cleanup. Here it is, exactly as documented, so you can paste it into a Snowsight worksheet and follow along.

Step 1: Create the golden table you want to match against

SQL

CREATE OR REPLACE TABLE golden_catalog (product_name TEXT);
INSERT INTO golden_catalog VALUES
  ('Wireless Bluetooth Headphones'),
  ('Wireless Noise-Canceling Earbuds'),
  ('USB-C Charging Cable 6ft'),
  ('Portable Power Bank 10000mAh');

Step 2: Create the Cortex Search Service on top of it

SQL

CREATE CORTEX SEARCH SERVICE golden_product_service
ON product_name
WAREHOUSE = <warehouse_name>
TARGET_LAG = '1 day'
AS
SELECT product_name FROM golden_catalog;

The search service is responsible for indexing the column and refreshing the index according to TARGET_LAG. The batch function does not build the index itself; it queries this existing service.

Step 3: Create the input table with all the queries you want to run

SQL

CREATE OR REPLACE TABLE submitted_products (product TEXT);
INSERT INTO submitted_products VALUES
  ('bluetooth headphones wireless'),
  ('usb c cable');

Step 4: Run the batch search

SQL

SELECT
  q.product, s.*
FROM submitted_products AS q,
LATERAL CORTEX_SEARCH_BATCH(
    service_name => 'golden_product_service',
    query => q.product,
    limit => 2,
    options => OBJECT_CONSTRUCT(
        'scoring_config', OBJECT_CONSTRUCT(
            'disable_vector_embedding_query_prefix', true
        )
    )
) AS s;

That single statement runs both submitted queries against the search service and returns the top 2 matches for each, joined back to the original input. Swap the two-row input for a million-row table and you have a production-grade entity resolution pipeline.

When to Use Batch vs Interactive Search

Snowflake gives a useful rule of thumb in the documentation: if you have fewer than about 2,000 queries to run, "you'll typically get faster results using the interactive Cortex Search API (Python or REST API) rather than batch search." Batch has additional startup latency that is amortized over high-volume jobs.

Scenario	Recommended surface
Chat assistant looking up a single user question	Interactive (Python / REST)
Sub-2,000 queries per job	Interactive (Python / REST)
Customer identity reconciliation across datasets	Batch (`CORTEX_SEARCH_BATCH`)
Product / address / company-name deduplication	Batch
Bulk enrichment of incoming data feeds	Batch
Nightly clustering or relationship-discovery job	Batch

Tuning Throughput: The `replicas` Option

The options VARIANT accepts a replicas key. The default is 2. Higher values "complete jobs faster but don't increase total serving cost," per the docs. The serving cost is set by the index size and total job duration; replicas only change the parallelism, not the bill.

A reasonable approach: leave replicas at the default for the first run, look at the wall-clock time in QUERY_HISTORY, and only bump it if the latency is hurting downstream SLAs.

How Batch Cortex Search Is Billed

There are three distinct cost components, all spelled out in the docs:

Serving cost — charged based on the size of the search index data and the duration of the batch search job.
Query embedding cost — a charge for the number of tokens embedded as a result of the input queries. Interactive Cortex Search does not charge this separately, so it is worth tracking the input size of your batch job.
Virtual warehouse cost — the warehouse compute used to run the batch job. Pick a warehouse sized for the orchestration overhead, not for the search itself. The serving cluster does the heavy lifting.

If you already model Cortex costs using our cost optimization guide, add a new line item for batch search and watch it for the first few weeks after rollout. We have seen teams underestimate the embedding cost on noisy text inputs.

Real-World Use Cases

Customer 360 identity resolution. Fuzzy-match incoming events against a canonical customer index across emails, names, and addresses. Run nightly to feed your dbt mart layer.
Product catalog cleanup. Reconcile SKUs from many supplier feeds against a master catalog. Pre-compute match candidates and a confidence score, then leave human review for the rows below a threshold.
Audience matching for advertising. Link customer identities across first-party and partner datasets when only inexact attributes are shared.
Compliance and sanctions screening. Match transaction parties against watch-lists where exact equality is not enough.
Document deduplication. Detect near-duplicate articles, tickets, or contracts when the original ingestion did not enforce uniqueness.

Operational Tips From Early Adopters

Build a representative dev service first. Carve out a small sample of your golden table and a small sample of your queries, then validate that the top-N matches at limit = 10 look correct before scaling up.
Always include a confidence threshold. The function returns ranked matches even for inputs that have no real match. Set a relevance threshold in the calling SQL so you do not write garbage into your downstream mart.
Watch query embedding cost on free-form text. User-submitted descriptions can be much longer than the queries you tested with. Truncate or summarize the input column before passing it in.
Combine with Streams and Tasks. Trigger a batch run only on new or changed rows rather than rebuilding the whole match table every night.
Audit results. Sample 1% of rows weekly and spot-check the matches. The model is good, not infallible, especially on multi-language inputs.

How Batch Cortex Search Fits Into a Snowflake AI Stack

Cortex Search, Cortex Analyst, and the various AI functions are now overlapping enough that picking the right one matters. A short cheat sheet:

Want a chat assistant over documents? Use interactive Cortex Search with a RAG pattern.
Want to let users ask questions of structured data? Use Cortex Analyst.
Want to fuzzy-match millions of records on a schedule? Use CORTEX_SEARCH_BATCH from a Task.
Want to secure any of the above against prompt injection? Enable our recently-covered Cortex AI Guardrails.

Key Takeaways

Batch Cortex Search reached GA on May 18, 2026. It is a SQL table function (CORTEX_SEARCH_BATCH) that runs many fuzzy-match queries against a Cortex Search Service in one statement.
Use it when you have ~2,000 or more queries per job; below that, the interactive API is typically faster.
You still need to create a Cortex Search Service first with CREATE CORTEX SEARCH SERVICE.
Costs come from three components: serving, query embedding, and warehouse compute. Watch the embedding cost on long free-form text inputs.
The replicas option tunes throughput without changing total serving cost; the default is 2.
Real-world fits: identity resolution, product / address dedup, audience matching, sanctions screening, document dedup.

Ritish, Senior Data Engineer

Ritish helps CelestInfo clients design Snowflake platforms with a focus on AI workloads, large-scale data engineering, and operational reliability.

Frequently Asked Questions

What is Batch Cortex Search?

A Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It became generally available on May 18, 2026.

When should I use Batch Cortex Search instead of the interactive API?

Use it when you have roughly 2,000 or more queries per job. For smaller workloads the interactive Python or REST API is typically faster because batch search has additional startup latency.

Do I still need to create a Cortex Search Service?

Yes. CORTEX_SEARCH_BATCH queries an existing service. You first create it with CREATE CORTEX SEARCH SERVICE on the target table, warehouse, and TARGET_LAG.

What does Batch Cortex Search cost?

Three components per the docs: serving cost (index size and job duration), query embedding cost (tokens embedded across all input queries), and the virtual warehouse compute used to run the job.

Can I tune throughput?

Yes, via the replicas key inside options. Higher values finish jobs faster but do not increase total serving cost. The default is 2.

Burning Questions
About CelestInfo

Simple answers to make things clear.

What is Batch Cortex Search?+

A SQL table function (CORTEX_SEARCH_BATCH) that runs many fuzzy-match queries against a Cortex Search Service in one statement. GA since May 18, 2026.

Batch or interactive search?+

Roughly 2,000 queries or more → batch. Anything smaller → the interactive Python or REST API, which avoids batch startup latency.

Do I still need a Cortex Search Service?+

Yes. Create one with CREATE CORTEX SEARCH SERVICE first; CORTEX_SEARCH_BATCH queries it.

How is it billed?+

Serving cost (index + duration), query embedding cost (tokens embedded), and the virtual warehouse compute used to run the job.

How do I make it faster?+

Set replicas higher in options. Higher replicas finish jobs faster but do not raise serving cost. Default is 2.

Still have questions?

Get Assistance

Batch Cortex Search in Snowflake: Fuzzy Matching Millions of Records in One SQL Statement

Why Batch Search Even Exists

What CORTEX_SEARCH_BATCH Actually Does

The Function Signature

End-to-End Example: Matching User Submissions Against a Golden Catalog

Step 1: Create the golden table you want to match against

Step 2: Create the Cortex Search Service on top of it

Step 3: Create the input table with all the queries you want to run

Step 4: Run the batch search

When to Use Batch vs Interactive Search

Tuning Throughput: The `replicas` Option

How Batch Cortex Search Is Billed

Real-World Use Cases

Operational Tips From Early Adopters

How Batch Cortex Search Fits Into a Snowflake AI Stack

Key Takeaways

Related Articles

Frequently Asked Questions

What is Batch Cortex Search?

When should I use Batch Cortex Search instead of the interactive API?

Do I still need to create a Cortex Search Service?

What does Batch Cortex Search cost?

Can I tune throughput?

Burning Questions
About CelestInfo

Ready? Let's Talk!

Batch Cortex Search in Snowflake: Fuzzy Matching Millions of Records in One SQL Statement

Why Batch Search Even Exists

What CORTEX_SEARCH_BATCH Actually Does

The Function Signature

End-to-End Example: Matching User Submissions Against a Golden Catalog

Step 1: Create the golden table you want to match against

Step 2: Create the Cortex Search Service on top of it

Step 3: Create the input table with all the queries you want to run

Step 4: Run the batch search

When to Use Batch vs Interactive Search

Tuning Throughput: The replicas Option

How Batch Cortex Search Is Billed

Real-World Use Cases

Operational Tips From Early Adopters

How Batch Cortex Search Fits Into a Snowflake AI Stack

Key Takeaways

Related Articles

Frequently Asked Questions

What is Batch Cortex Search?

When should I use Batch Cortex Search instead of the interactive API?

Do I still need to create a Cortex Search Service?

What does Batch Cortex Search cost?

Can I tune throughput?

Burning QuestionsAbout CelestInfo

Tuning Throughput: The `replicas` Option

Burning Questions
About CelestInfo