Snowflake External Tables: Query Your Data Lake Without Loading It

Pranay Vatsal

Published Mar 27, 2025 · Last updated Feb 2026

Celestinfo Software Solutions Pvt. Ltd. • Mar 27, 2025

Last updated: April 2025

Quick answer: External tables let Snowflake query files in S3, GCS, or ADLS without copying them into Snowflake storage. You define a metadata layer over the external files, and Snowflake reads them at query time. Performance is 2-5x slower than native tables, but you avoid data duplication and storage costs for rarely-accessed datasets.

What External Tables Are

A regular Snowflake table stores data in Snowflake-managed micro-partitions. An external table doesn't store any data. It's a metadata layer that tells Snowflake where to find files in an external cloud storage location (S3 bucket, GCS bucket, or ADLS container), what format they're in, and how to parse them.

When you query an external table, Snowflake reads the files from the external storage at query time. The data never gets copied into Snowflake. This makes external tables ideal for data lake architectures where data needs to be accessible from multiple compute engines - Snowflake, Databricks, Spark, Presto - without duplicating it for each one.

Setting Up: Storage Integration and External Stage

Before creating an external table, you need two things: a storage integration (for authentication) and an external stage (pointing to the file location). Here's the S3 setup:

SQL - Step 1: Storage Integration

-- Create a storage integration (ACCOUNTADMIN required)
CREATE OR REPLACE STORAGE INTEGRATION s3_data_lake_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-s3-access'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-data-lake/events/');

-- Get the AWS IAM info to configure the trust policy
DESC STORAGE INTEGRATION s3_data_lake_int;

SQL - Step 2: External Stage

CREATE OR REPLACE STAGE ext_events_stage
  URL = 's3://my-data-lake/events/'
  STORAGE_INTEGRATION = s3_data_lake_int
  FILE_FORMAT = (TYPE = 'PARQUET');

Creating an External Table with Partitioning

The key to usable external table performance is partitioning. If your S3 files are organized by date (e.g., s3://bucket/events/year=2026/month=02/day=27/), you can define partition columns that let Snowflake prune files at query time instead of scanning everything.

SQL - External Table with Date Partitioning

CREATE OR REPLACE EXTERNAL TABLE analytics.ext_events (
  -- Partition columns derived from the file path
  event_date DATE AS (
    TO_DATE(
      SPLIT_PART(metadata$filename, '/', 2) || '-' ||
      SPLIT_PART(metadata$filename, '/', 3) || '-' ||
      SPLIT_PART(metadata$filename, '/', 4),
      'YYYY-MM-DD'
    )
  ),
  -- Data columns from Parquet files
  event_id VARCHAR AS (value:event_id::VARCHAR),
  user_id INT AS (value:user_id::INT),
  event_type VARCHAR AS (value:event_type::VARCHAR),
  properties VARIANT AS (value:properties::VARIANT),
  event_timestamp TIMESTAMP_NTZ AS (value:event_timestamp::TIMESTAMP_NTZ)
)
PARTITION BY (event_date)
LOCATION = @ext_events_stage
AUTO_REFRESH = TRUE
FILE_FORMAT = (TYPE = 'PARQUET');

The PARTITION BY (event_date) clause tells Snowflake to use the event_date column for partition pruning. A query with WHERE event_date = '2026-02-27' only reads files from that day's folder, skipping everything else.

Auto-Refresh: Keeping Metadata Current

With AUTO_REFRESH = TRUE, Snowflake uses cloud event notifications to detect new files:

S3: SQS (Simple Queue Service) notifications triggered by new object creation.
GCS: Pub/Sub notifications.
ADLS: Azure Event Grid notifications.

When a new file lands in the S3 bucket, Snowflake picks up the SQS event, updates the external table's metadata, and the file becomes queryable. This process is eventually consistent - expect a 1-2 minute delay between file arrival and query availability.

For the S3 setup, Snowflake provides the SQS queue ARN after you create the external table. You configure the S3 bucket's event notification to send to that queue:

SQL

-- Get the notification channel details
DESC EXTERNAL TABLE analytics.ext_events;

If auto-refresh isn't configured, you'll need to manually refresh the metadata:

SQL

ALTER EXTERNAL TABLE analytics.ext_events REFRESH;

Performance: External vs Native Tables

External tables are fundamentally slower than native Snowflake tables. Here's why:

No micro-partition optimization: Snowflake can't apply its columnar compression, min/max pruning, or clustering optimization to external files.
Network latency: Every query reads data over the network from S3/GCS/ADLS. Native tables read from Snowflake's local SSD cache.
No result caching: Snowflake's result cache doesn't apply to external table queries.

In practice, expect external table queries to run 2-5x slower than the same query on a native table. Parquet files perform better than CSV because Snowflake can push down column pruning into the file format. Properly partitioned external tables with narrow date-filtered queries can approach native performance.

Materialized Views Over External Tables

If you need better query performance on frequently-accessed external data, create a materialized view over the external table. Snowflake caches the results and automatically refreshes them when the underlying data changes:

SQL

CREATE OR REPLACE MATERIALIZED VIEW analytics.mv_daily_events AS
SELECT
  event_date,
  event_type,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users
FROM analytics.ext_events
WHERE event_date >= DATEADD('day', -30, CURRENT_DATE())
GROUP BY event_date, event_type;

The materialized view stores precomputed results in Snowflake's native storage. Queries against the MV run at native table speed. Snowflake auto-refreshes the MV when it detects new files in the external table, though there's a delay.

When to Use External Tables

Good Use Cases

Data lake federation: Query data that's shared between Snowflake, Databricks, and Spark without duplicating it.
Cost savings for cold data: Keep rarely-queried historical data in cheaper S3 storage instead of paying Snowflake storage rates.
Compliance requirements: When data must remain in a specific storage location for regulatory reasons.
ELT staging: Use external tables as a read-only view of landing zone files before running COPY INTO to load them into native tables.

Bad Use Cases

High-frequency queries: If the same table gets queried 100+ times per day, the performance penalty adds up. Load it natively.
Interactive dashboards: Dashboard users expect sub-second response times. External tables can't deliver that.
DML requirements: You can't INSERT, UPDATE, or DELETE on external tables. If you need write access, use native tables.
Complex joins: Joining external tables with native tables forces Snowflake to read the external data for every query execution.

The Partition Pruning Gotcha

File-level pruning on external tables only works if your partition columns are properly defined and your queries filter on them. If you define an external table without partition columns, Snowflake scans every file in the stage for every query - even if you only need yesterday's data.

The metadata refresh is eventually consistent. After a new file arrives in S3, there's a 1-2 minute window where Snowflake doesn't know about it. If you have a pipeline that writes a file and immediately queries the external table, you might miss the new data. Build in a short delay (2-3 minutes) or use a manual REFRESH before querying.

Key Takeaways

External tables are a metadata layer over S3/GCS/ADLS files. Data stays in external storage and is read at query time.
Performance is 2-5x slower than native tables. Use partitioning and Parquet files to minimize the gap.
Auto-refresh uses cloud event notifications (SQS, Pub/Sub, Event Grid) and is eventually consistent with a 1-2 minute delay.
Materialized views over external tables give you native query speed for frequently-accessed aggregations.
Use external tables for data lake federation, cold data querying, and compliance scenarios. Don't use them for interactive dashboards or high-frequency queries.
Always define partition columns. Without them, Snowflake scans every file for every query.

Frequently Asked Questions

Q: How fast are Snowflake external tables compared to native tables?

External tables are typically 2-5x slower than native tables for the same query, because Snowflake must read data from the external storage (S3/GCS/ADLS) rather than from its own optimized micro-partition storage. The gap narrows with Parquet files and proper partitioning.

Q: Can I run DML on Snowflake external tables?

No. External tables are read-only in Snowflake. You can SELECT from them and use them in joins, but you cannot INSERT, UPDATE, or DELETE rows. To modify the data, you must update the files in the external storage directly.

Q: How does auto-refresh work for Snowflake external tables?

Auto-refresh uses cloud event notifications (SQS for S3, GCS Pub/Sub, or Azure Event Grid) to detect new files in the external stage. When a new file lands, Snowflake automatically updates the external table's metadata. This is eventually consistent - there can be a 1-2 minute delay.

Q: When should I use external tables instead of loading data into Snowflake?

Use external tables for rarely-queried data, data lake federation across Snowflake and Spark/Databricks, cost savings when you don't want to pay for Snowflake storage, or when the data must stay in its original location for compliance or multi-tool access.

Pranay Vatsal, Founder & CEO

Pranay Vatsal is the Founder & CEO of CelestInfo with deep expertise in Snowflake, data architecture, and building production-grade data systems for global enterprises.

Burning Questions
About CelestInfo

Simple answers to make things clear.

How accurate are the AI insights?+

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Can I integrate with my existing tools?+

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

What security measures do you have?+

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

How often are insights updated?+

Insights are updated in real-time as new data becomes available.

What kind of support do you offer?+

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Snowflake External Tables: Query Your Data Lake Without Loading It

What External Tables Are

Setting Up: Storage Integration and External Stage

Creating an External Table with Partitioning

Auto-Refresh: Keeping Metadata Current

Performance: External vs Native Tables

Materialized Views Over External Tables

When to Use External Tables

Good Use Cases

Bad Use Cases

The Partition Pruning Gotcha

Key Takeaways

Frequently Asked Questions

Q: How fast are Snowflake external tables compared to native tables?

Q: Can I run DML on Snowflake external tables?

Q: How does auto-refresh work for Snowflake external tables?

Q: When should I use external tables instead of loading data into Snowflake?

Related Articles

Burning Questions
About CelestInfo

Ready? Let's Talk!

Snowflake External Tables: Query Your Data Lake Without Loading It

What External Tables Are

Setting Up: Storage Integration and External Stage

Creating an External Table with Partitioning

Auto-Refresh: Keeping Metadata Current

Performance: External vs Native Tables

Materialized Views Over External Tables

When to Use External Tables

Good Use Cases

Bad Use Cases

The Partition Pruning Gotcha

Key Takeaways

Frequently Asked Questions

Q: How fast are Snowflake external tables compared to native tables?

Q: Can I run DML on Snowflake external tables?

Q: How does auto-refresh work for Snowflake external tables?

Q: When should I use external tables instead of loading data into Snowflake?

Related Articles

Burning QuestionsAbout CelestInfo

Burning Questions
About CelestInfo