Snowpipe vs Snowpipe Streaming: Real-Time Data Ingestion in Snowflake
Last updated: December 2025
Quick answer: Snowpipe is file-based ingestion - it watches for new files on S3/GCS/ADLS and loads them via serverless compute with ~1-3 minute latency. Snowpipe Streaming is row-based - it uses the Snowflake Ingest SDK to insert rows directly without staging files, achieving sub-second latency. Use Snowpipe for batch file drops from ETL tools. Use Snowpipe Streaming for Kafka-style streaming, real-time CDC, or IoT sensor data.
Introduction
Snowflake gives you two serverless ingestion mechanisms, and picking the wrong one can cost you either latency or money. Snowpipe has been around since 2018 and handles file-based loading well. Snowpipe Streaming arrived later and works at the row level. They solve different problems, but the naming makes it easy to confuse them. This guide breaks down exactly how each works, what they cost, and when to use which.
How Snowpipe Classic Works
Snowpipe classic is a file-triggered, serverless ingestion service. You put files on cloud storage (S3, GCS, or Azure Blob/ADLS), and Snowpipe detects them and loads them into a Snowflake table. Under the hood, it uses a serverless compute pool managed by Snowflake - you don't provision a warehouse for it.
The File-Based Pipeline
The typical setup looks like this: your ETL tool or application writes files (CSV, JSON, Parquet, etc.) to an S3 bucket. An S3 event notification fires when a new file lands. That notification goes to an SQS queue that Snowpipe monitors. Snowpipe picks up the notification, spins up serverless compute, runs a COPY INTO operation, and marks the file as loaded.
CREATE PIPE my_pipe
AUTO_INGEST = TRUE
AS
COPY INTO my_database.my_schema.my_table
FROM @my_s3_stage
FILE_FORMAT = (TYPE = 'PARQUET');
The AUTO_INGEST = TRUE flag tells Snowpipe to listen for event notifications. Without it, you'd need to call the insertFiles REST API manually to trigger loads.
Latency and Polling Behavior
Here's the gotcha most people miss: Snowpipe doesn't process files instantly when it gets the SQS notification. It queues the file and processes it on an internal schedule. The minimum end-to-end latency is approximately 60 seconds, and in practice it's 1-3 minutes depending on queue depth and file size. If you're expecting sub-minute latency from Snowpipe classic, you won't get it. It's designed for micro-batch loading, not real-time streaming.
How Snowpipe Streaming Works
Snowpipe Streaming takes a fundamentally different approach. Instead of loading files from cloud storage, it accepts rows directly via the Snowflake Ingest SDK. There's no staging area, no files, no COPY INTO. Your application calls the SDK, passes rows, and they land in the Snowflake table. Latency is measured in seconds, not minutes.
The Row-Based Pipeline
Your application (or the Kafka connector) creates a channel to a target table using the Ingest SDK. It sends rows through the channel. Snowflake buffers them briefly and writes them to the table. The rows are queryable within seconds of being sent. No intermediate files, no staging tables, no COPY command.
The Ingest SDK is Java-only as of early 2026. If you're running a Python pipeline, you can't use Snowpipe Streaming directly - you'd need a Java wrapper service or use the Kafka connector, which handles the SDK integration for you.
Setting Up Snowpipe with S3 Event Notifications
The setup involves 3 components: the S3 bucket, the SQS queue, and the Snowpipe definition.
- Create a storage integration in Snowflake that grants access to your S3 bucket. This creates an IAM trust relationship between Snowflake's AWS account and yours. See our IAM role setup guide for the AWS side.
- Create an external stage pointing to the S3 path where files will land.
- Create the pipe with
AUTO_INGEST = TRUE. - Configure SQS notifications on the S3 bucket. Run
SHOW PIPESto get thenotification_channelARN - that's the SQS queue Snowflake created. Add that as the destination in your S3 bucket's event notification configuration.
-- Storage integration
CREATE STORAGE INTEGRATION s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/data/');
-- External stage
CREATE STAGE my_s3_stage
STORAGE_INTEGRATION = s3_int
URL = 's3://my-bucket/data/';
-- Pipe
CREATE PIPE my_pipe AUTO_INGEST = TRUE AS
COPY INTO raw.events FROM @my_s3_stage
FILE_FORMAT = (TYPE = 'JSON');
-- Get SQS ARN for S3 event config
SHOW PIPES LIKE 'my_pipe';
Configuring the Kafka Connector for Snowpipe Streaming
The Snowflake Kafka connector supports both Snowpipe classic and Snowpipe Streaming as ingestion methods. For Streaming, set snowflake.ingestion.method=SNOWPIPE_STREAMING in your connector configuration. The connector handles the Ingest SDK integration - it reads from Kafka topics and pushes rows through Snowpipe Streaming channels.
{
"name": "snowflake-sink",
"config": {
"connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
"snowflake.url.name": "account.snowflakecomputing.com",
"snowflake.user.name": "kafka_user",
"snowflake.private.key": "${file:/secrets/snowflake_key.pem}",
"snowflake.database.name": "RAW_DB",
"snowflake.schema.name": "KAFKA",
"snowflake.ingestion.method": "SNOWPIPE_STREAMING",
"snowflake.role.name": "KAFKA_ROLE",
"topics": "clickstream,user_events",
"buffer.flush.time": "10",
"buffer.count.records": "10000"
}
}
The buffer.flush.time and buffer.count.records settings control how often the connector flushes rows to Snowflake. Lower values mean lower latency but more overhead. For most use cases, flushing every 10 seconds or every 10,000 records (whichever comes first) is a good starting point.
Monitoring: COPY_HISTORY and PIPE_STATUS
For Snowpipe classic, COPY_HISTORY in the INFORMATION_SCHEMA shows every file load - status, row count, errors, and load time. SYSTEM$PIPE_STATUS tells you the current state of the pipe: how many files are queued, how many are being processed, and the last time a file was loaded.
-- Recent Snowpipe loads
SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
TABLE_NAME => 'MY_TABLE',
START_TIME => DATEADD(hours, -24, CURRENT_TIMESTAMP())
));
-- Pipe status
SELECT SYSTEM$PIPE_STATUS('my_pipe');
-- Snowpipe Streaming channel status
SELECT * FROM TABLE(INFORMATION_SCHEMA.SNOWPIPE_STREAMING_FILE_MIGRATION_HISTORY(
DATE_RANGE_START => DATEADD(hours, -24, CURRENT_TIMESTAMP())
));
Cost Comparison
This is where the decision gets interesting. Snowpipe classic charges per-file overhead - each file incurs a fixed cost for serverless compute regardless of file size. If you're loading thousands of tiny files (under 100MB each), the per-file overhead adds up fast. Snowpipe Streaming charges per-row, but the per-row cost is very small, and there's no file overhead.
- Snowpipe classic: 0.06 credits per 1,000 files loaded (approximate). Best when files are large (100MB+) and arrive in batches. Worst when you have high-frequency small files - loading 10,000 1KB files costs the same as loading 10,000 1GB files.
- Snowpipe Streaming: Charged based on compute time for row processing. More cost-effective for high-frequency, small-payload ingestion. A Kafka topic producing 50,000 messages/minute will cost significantly less via Streaming than if you staged those messages as files and loaded them through classic Snowpipe.
The general rule: if your source produces files (ETL output, log rotation, data exports), use Snowpipe classic. If your source produces individual events or records (Kafka, CDC streams, IoT telemetry), use Snowpipe Streaming.
When to Use Snowpipe Classic
- Batch file drops: ETL tools like Talend, Informatica, or ADF that write output files to cloud storage.
- S3 event-driven loading: Applications that write files to S3 and need them loaded automatically. See our guide on loading data from Azure to Snowflake for the ADLS equivalent.
- Log file ingestion: Rotated log files that land on S3 every few minutes.
- Data sharing scenarios: External partners dropping files in a shared bucket. For cross-cloud data sharing patterns, see our data sharing guide.
When to Use Snowpipe Streaming
- Kafka topics: Any Kafka-based pipeline where you need data in Snowflake within seconds. Use the Kafka connector with
SNOWPIPE_STREAMINGmethod. - Real-time CDC: Change data capture from databases (Debezium, AWS DMS) where you need near-instant replication to Snowflake.
- IoT sensor data: High-frequency, small-payload telemetry. Thousands of sensors sending readings every second.
- Clickstream / user events: Website or app events that need to be queryable for real-time dashboards.
Gotchas and Limitations
- Snowpipe classic has a ~60-second minimum latency. It's not real-time. Don't architect around it if you need sub-minute data freshness. It polls internally and batches file loads.
- Snowpipe Streaming requires the Ingest SDK, which is Java-only. There's no Python SDK yet. If your team is Python-first, you'll need to either run the Kafka connector (which wraps the Java SDK) or write a thin Java service.
- Both have soft limits on concurrent loads per table. Snowpipe classic can handle up to 10,000 files queued per pipe. Snowpipe Streaming supports up to ~100 channels per table, depending on the table's clustering configuration.
- Snowpipe Streaming data isn't immediately optimized. Rows land in micro-partitions that aren't ideally clustered. Snowflake's automatic clustering service will reorganize them over time, but if you query immediately, scan efficiency may be lower than for batch-loaded data.
- Error handling differs significantly. Snowpipe classic gives you
ON_ERRORoptions (CONTINUE, SKIP_FILE, ABORT_STATEMENT). Snowpipe Streaming fails at the row level and requires application-side error handling via the SDK's callback mechanism.
Key Takeaways
- Snowpipe classic = file-based, 1-3 minute latency, triggered by cloud storage events. Best for ETL output and batch file drops.
- Snowpipe Streaming = row-based, sub-second latency, uses the Java Ingest SDK. Best for Kafka, CDC, IoT, and event streams.
- Snowpipe classic charges per-file; Streaming charges per-row compute. For high-frequency small payloads, Streaming is cheaper.
- The Kafka connector supports both methods. Set
snowflake.ingestion.methodto switch between them. - Monitor with
COPY_HISTORYfor classic and the Snowpipe Streaming system functions for Streaming. - Neither is a general-purpose ETL tool. They handle ingestion only - you still need separate compute for transformations.
Related Articles
Frequently Asked Questions
Q: What is the difference between Snowpipe and Snowpipe Streaming?
Snowpipe is file-based: it detects new files on cloud storage (S3, GCS, ADLS) and loads them using serverless compute. Snowpipe Streaming is row-based: it uses the Snowflake Ingest SDK to insert rows directly into tables without staging files, achieving sub-second latency.
Q: What is the minimum latency for Snowpipe?
Snowpipe has a minimum latency of approximately 60 seconds because it polls for new files on an internal schedule. Actual end-to-end latency is typically 1-3 minutes depending on file size and queue depth.
Q: Does Snowpipe Streaming support Python?
As of early 2026, the Snowflake Ingest SDK for Snowpipe Streaming is Java-only. There is no official Python SDK for Snowpipe Streaming yet. Python-based pipelines typically use the Kafka connector with Snowpipe Streaming or fall back to Snowpipe classic.
Q: How is Snowpipe Streaming different from the Kafka connector?
The Snowflake Kafka connector can use Snowpipe Streaming as its underlying ingestion method (via the 'snowflake.ingestion.method=SNOWPIPE_STREAMING' config). The Kafka connector handles the Kafka consumer logic while Snowpipe Streaming handles the Snowflake-side row insertion.
