Batch Cortex Search in Snowflake: Fuzzy Matching Millions of Records in One SQL Statement
Quick answer: Batch Cortex Search is a Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It went generally available on May 18, 2026 and is purpose-built for entity resolution, customer identity matching, deduplication, and bulk enrichment workloads where you would otherwise loop over thousands of REST calls. Per Snowflake's documentation, jobs over roughly 2,000 queries see significantly higher throughput than the interactive API.
Last updated: May 2026
Why Batch Search Even Exists
Cortex Search, covered in our Cortex AI guide, has always been a great fit for one-at-a-time queries: a chat assistant looks up a single user's question and gets ranked results back in tens of milliseconds. That request/response shape is exactly what the interactive Python and REST surfaces are tuned for.
The problem starts when you flip the workload around. Picture a CRM cleanup job. You have 750,000 user-submitted addresses and a master address table. You want to find the closest match in master for each submission, scored, with a confidence threshold. Calling the interactive API 750,000 times burns through your network, your warehouse, your patience, and your wallet. People have been writing their own batching wrappers in Python for years.
Batch Cortex Search is Snowflake replacing that homegrown wrapper with a first-class SQL table function. You pass an input table, the engine fans out the queries against a dedicated batch compute pool, and you get a fully joined result set back.
What CORTEX_SEARCH_BATCH Actually Does
From the official documentation: "The CORTEX_SEARCH_BATCH function allows jobs submitted to a Cortex Search Service to leverage additional compute resources to provide significantly higher throughput than the interactive API search query surfaces."
In plain language, it is a lateral table function. You give it the name of a search service, a column of query strings, and optional filters, and it returns one ranked result row per input query (or per result if limit > 1).
The Function Signature
Snowflake's documentation defines the table function as follows:
CORTEX_SEARCH_BATCH(
service_name => '<database>.<schema>.<cortex_search_service>',
query => q.query, -- optional STRING
multi_index_query => q.miq, -- optional VARIANT
filter => q.filter, -- optional VARIANT
limit => 10, -- optional INT, default 10
options => q.options -- optional VARIANT
)
Parameter notes (from the Snowflake docs):
service_name(required): fully-qualified name of the Cortex Search Service to query.query(optional): column containing the query string for searching the service.multi_index_query(optional): an object that specifies one or more vector or keyword query inputs, with a maximum of one vector index entry per query for performance.filter(optional): column containing filter objects to apply to the search results.limit(optional): maximum number of results to return per query. Default is 10.options(optional): supportsscoring_configandreplicas(default 2).
At least one of query, multi_index_query, or filter must be specified.
End-to-End Example: Matching User Submissions Against a Golden Catalog
The official docs show a clean example of product-catalog cleanup. Here it is, exactly as documented, so you can paste it into a Snowsight worksheet and follow along.
Step 1: Create the golden table you want to match against
CREATE OR REPLACE TABLE golden_catalog (product_name TEXT);
INSERT INTO golden_catalog VALUES
('Wireless Bluetooth Headphones'),
('Wireless Noise-Canceling Earbuds'),
('USB-C Charging Cable 6ft'),
('Portable Power Bank 10000mAh');
Step 2: Create the Cortex Search Service on top of it
CREATE CORTEX SEARCH SERVICE golden_product_service ON product_name WAREHOUSE = <warehouse_name> TARGET_LAG = '1 day' AS SELECT product_name FROM golden_catalog;
The search service is responsible for indexing the column and refreshing the index according to TARGET_LAG. The batch function does not build the index itself; it queries this existing service.
Step 3: Create the input table with all the queries you want to run
CREATE OR REPLACE TABLE submitted_products (product TEXT);
INSERT INTO submitted_products VALUES
('bluetooth headphones wireless'),
('usb c cable');
Step 4: Run the batch search
SELECT
q.product, s.*
FROM submitted_products AS q,
LATERAL CORTEX_SEARCH_BATCH(
service_name => 'golden_product_service',
query => q.product,
limit => 2,
options => OBJECT_CONSTRUCT(
'scoring_config', OBJECT_CONSTRUCT(
'disable_vector_embedding_query_prefix', true
)
)
) AS s;
That single statement runs both submitted queries against the search service and returns the top 2 matches for each, joined back to the original input. Swap the two-row input for a million-row table and you have a production-grade entity resolution pipeline.
When to Use Batch vs Interactive Search
Snowflake gives a useful rule of thumb in the documentation: if you have fewer than about 2,000 queries to run, "you'll typically get faster results using the interactive Cortex Search API (Python or REST API) rather than batch search." Batch has additional startup latency that is amortized over high-volume jobs.
| Scenario | Recommended surface |
|---|---|
| Chat assistant looking up a single user question | Interactive (Python / REST) |
| Sub-2,000 queries per job | Interactive (Python / REST) |
| Customer identity reconciliation across datasets | Batch (CORTEX_SEARCH_BATCH) |
| Product / address / company-name deduplication | Batch |
| Bulk enrichment of incoming data feeds | Batch |
| Nightly clustering or relationship-discovery job | Batch |
Tuning Throughput: The replicas Option
The options VARIANT accepts a replicas key. The default is 2. Higher values "complete jobs faster but don't increase total serving cost," per the docs. The serving cost is set by the index size and total job duration; replicas only change the parallelism, not the bill.
A reasonable approach: leave replicas at the default for the first run, look at the wall-clock time in QUERY_HISTORY, and only bump it if the latency is hurting downstream SLAs.
How Batch Cortex Search Is Billed
There are three distinct cost components, all spelled out in the docs:
- Serving cost — charged based on the size of the search index data and the duration of the batch search job.
- Query embedding cost — a charge for the number of tokens embedded as a result of the input queries. Interactive Cortex Search does not charge this separately, so it is worth tracking the input size of your batch job.
- Virtual warehouse cost — the warehouse compute used to run the batch job. Pick a warehouse sized for the orchestration overhead, not for the search itself. The serving cluster does the heavy lifting.
If you already model Cortex costs using our cost optimization guide, add a new line item for batch search and watch it for the first few weeks after rollout. We have seen teams underestimate the embedding cost on noisy text inputs.
Real-World Use Cases
- Customer 360 identity resolution. Fuzzy-match incoming events against a canonical customer index across emails, names, and addresses. Run nightly to feed your dbt mart layer.
- Product catalog cleanup. Reconcile SKUs from many supplier feeds against a master catalog. Pre-compute match candidates and a confidence score, then leave human review for the rows below a threshold.
- Audience matching for advertising. Link customer identities across first-party and partner datasets when only inexact attributes are shared.
- Compliance and sanctions screening. Match transaction parties against watch-lists where exact equality is not enough.
- Document deduplication. Detect near-duplicate articles, tickets, or contracts when the original ingestion did not enforce uniqueness.
Operational Tips From Early Adopters
- Build a representative dev service first. Carve out a small sample of your golden table and a small sample of your queries, then validate that the top-N matches at
limit = 10look correct before scaling up. - Always include a confidence threshold. The function returns ranked matches even for inputs that have no real match. Set a relevance threshold in the calling SQL so you do not write garbage into your downstream mart.
- Watch query embedding cost on free-form text. User-submitted descriptions can be much longer than the queries you tested with. Truncate or summarize the input column before passing it in.
- Combine with Streams and Tasks. Trigger a batch run only on new or changed rows rather than rebuilding the whole match table every night.
- Audit results. Sample 1% of rows weekly and spot-check the matches. The model is good, not infallible, especially on multi-language inputs.
How Batch Cortex Search Fits Into a Snowflake AI Stack
Cortex Search, Cortex Analyst, and the various AI functions are now overlapping enough that picking the right one matters. A short cheat sheet:
- Want a chat assistant over documents? Use interactive Cortex Search with a RAG pattern.
- Want to let users ask questions of structured data? Use Cortex Analyst.
- Want to fuzzy-match millions of records on a schedule? Use
CORTEX_SEARCH_BATCHfrom a Task. - Want to secure any of the above against prompt injection? Enable our recently-covered Cortex AI Guardrails.
Key Takeaways
- Batch Cortex Search reached GA on May 18, 2026. It is a SQL table function (
CORTEX_SEARCH_BATCH) that runs many fuzzy-match queries against a Cortex Search Service in one statement. - Use it when you have ~2,000 or more queries per job; below that, the interactive API is typically faster.
- You still need to create a Cortex Search Service first with
CREATE CORTEX SEARCH SERVICE. - Costs come from three components: serving, query embedding, and warehouse compute. Watch the embedding cost on long free-form text inputs.
- The
replicasoption tunes throughput without changing total serving cost; the default is 2. - Real-world fits: identity resolution, product / address dedup, audience matching, sanctions screening, document dedup.
Related Articles
Frequently Asked Questions
What is Batch Cortex Search?
A Snowflake-managed SQL table function called CORTEX_SEARCH_BATCH that runs many fuzzy-match queries against a Cortex Search Service in a single statement. It became generally available on May 18, 2026.
When should I use Batch Cortex Search instead of the interactive API?
Use it when you have roughly 2,000 or more queries per job. For smaller workloads the interactive Python or REST API is typically faster because batch search has additional startup latency.
Do I still need to create a Cortex Search Service?
Yes. CORTEX_SEARCH_BATCH queries an existing service. You first create it with CREATE CORTEX SEARCH SERVICE on the target table, warehouse, and TARGET_LAG.
What does Batch Cortex Search cost?
Three components per the docs: serving cost (index size and job duration), query embedding cost (tokens embedded across all input queries), and the virtual warehouse compute used to run the job.
Can I tune throughput?
Yes, via the replicas key inside options. Higher values finish jobs faster but do not increase total serving cost. The default is 2.
Still have questions?
Get AssistanceReady? Let's Talk!
Get expert insights and answers tailored to your business requirements and transformation.
Get Assistance