Snowflake External Tables: Query Your Data Lake

Snowflake External Tables: Query Your Data Lake Without Loading It

Celestinfo Software Solutions Pvt. Ltd. Mar 27, 2025

Last updated: April 2025

Quick answer: External tables let Snowflake query files in S3, GCS, or ADLS without copying them into Snowflake storage. You define a metadata layer over the external files, and Snowflake reads them at query time. Performance is 2-5x slower than native tables, but you avoid data duplication and storage costs for rarely-accessed datasets.

What External Tables Are


A regular Snowflake table stores data in Snowflake-managed micro-partitions. An external table doesn't store any data. It's a metadata layer that tells Snowflake where to find files in an external cloud storage location (S3 bucket, GCS bucket, or ADLS container), what format they're in, and how to parse them.


When you query an external table, Snowflake reads the files from the external storage at query time. The data never gets copied into Snowflake. This makes external tables ideal for data lake architectures where data needs to be accessible from multiple compute engines - Snowflake, Databricks, Spark, Presto - without duplicating it for each one.


Setting Up: Storage Integration and External Stage


Before creating an external table, you need two things: a storage integration (for authentication) and an external stage (pointing to the file location). Here's the S3 setup:


SQL - Step 1: Storage Integration
-- Create a storage integration (ACCOUNTADMIN required)
CREATE OR REPLACE STORAGE INTEGRATION s3_data_lake_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-s3-access'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-data-lake/events/');

-- Get the AWS IAM info to configure the trust policy
DESC STORAGE INTEGRATION s3_data_lake_int;

SQL - Step 2: External Stage
CREATE OR REPLACE STAGE ext_events_stage
  URL = 's3://my-data-lake/events/'
  STORAGE_INTEGRATION = s3_data_lake_int
  FILE_FORMAT = (TYPE = 'PARQUET');

Creating an External Table with Partitioning


The key to usable external table performance is partitioning. If your S3 files are organized by date (e.g., s3://bucket/events/year=2026/month=02/day=27/), you can define partition columns that let Snowflake prune files at query time instead of scanning everything.


SQL - External Table with Date Partitioning
CREATE OR REPLACE EXTERNAL TABLE analytics.ext_events (
  -- Partition columns derived from the file path
  event_date DATE AS (
    TO_DATE(
      SPLIT_PART(metadata$filename, '/', 2) || '-' ||
      SPLIT_PART(metadata$filename, '/', 3) || '-' ||
      SPLIT_PART(metadata$filename, '/', 4),
      'YYYY-MM-DD'
    )
  ),
  -- Data columns from Parquet files
  event_id VARCHAR AS (value:event_id::VARCHAR),
  user_id INT AS (value:user_id::INT),
  event_type VARCHAR AS (value:event_type::VARCHAR),
  properties VARIANT AS (value:properties::VARIANT),
  event_timestamp TIMESTAMP_NTZ AS (value:event_timestamp::TIMESTAMP_NTZ)
)
PARTITION BY (event_date)
LOCATION = @ext_events_stage
AUTO_REFRESH = TRUE
FILE_FORMAT = (TYPE = 'PARQUET');

The PARTITION BY (event_date) clause tells Snowflake to use the event_date column for partition pruning. A query with WHERE event_date = '2026-02-27' only reads files from that day's folder, skipping everything else.


Auto-Refresh: Keeping Metadata Current


With AUTO_REFRESH = TRUE, Snowflake uses cloud event notifications to detect new files:



When a new file lands in the S3 bucket, Snowflake picks up the SQS event, updates the external table's metadata, and the file becomes queryable. This process is eventually consistent - expect a 1-2 minute delay between file arrival and query availability.


For the S3 setup, Snowflake provides the SQS queue ARN after you create the external table. You configure the S3 bucket's event notification to send to that queue:

SQL
-- Get the notification channel details
DESC EXTERNAL TABLE analytics.ext_events;

If auto-refresh isn't configured, you'll need to manually refresh the metadata:

SQL
ALTER EXTERNAL TABLE analytics.ext_events REFRESH;

Performance: External vs Native Tables


External tables are fundamentally slower than native Snowflake tables. Here's why:



In practice, expect external table queries to run 2-5x slower than the same query on a native table. Parquet files perform better than CSV because Snowflake can push down column pruning into the file format. Properly partitioned external tables with narrow date-filtered queries can approach native performance.


Materialized Views Over External Tables


If you need better query performance on frequently-accessed external data, create a materialized view over the external table. Snowflake caches the results and automatically refreshes them when the underlying data changes:


SQL
CREATE OR REPLACE MATERIALIZED VIEW analytics.mv_daily_events AS
SELECT
  event_date,
  event_type,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users
FROM analytics.ext_events
WHERE event_date >= DATEADD('day', -30, CURRENT_DATE())
GROUP BY event_date, event_type;

The materialized view stores precomputed results in Snowflake's native storage. Queries against the MV run at native table speed. Snowflake auto-refreshes the MV when it detects new files in the external table, though there's a delay.


When to Use External Tables


Good Use Cases


Bad Use Cases


The Partition Pruning Gotcha


File-level pruning on external tables only works if your partition columns are properly defined and your queries filter on them. If you define an external table without partition columns, Snowflake scans every file in the stage for every query - even if you only need yesterday's data.


The metadata refresh is eventually consistent. After a new file arrives in S3, there's a 1-2 minute window where Snowflake doesn't know about it. If you have a pipeline that writes a file and immediately queries the external table, you might miss the new data. Build in a short delay (2-3 minutes) or use a manual REFRESH before querying.


Key Takeaways



Frequently Asked Questions

Q: How fast are Snowflake external tables compared to native tables?

External tables are typically 2-5x slower than native tables for the same query, because Snowflake must read data from the external storage (S3/GCS/ADLS) rather than from its own optimized micro-partition storage. The gap narrows with Parquet files and proper partitioning.

Q: Can I run DML on Snowflake external tables?

No. External tables are read-only in Snowflake. You can SELECT from them and use them in joins, but you cannot INSERT, UPDATE, or DELETE rows. To modify the data, you must update the files in the external storage directly.

Q: How does auto-refresh work for Snowflake external tables?

Auto-refresh uses cloud event notifications (SQS for S3, GCS Pub/Sub, or Azure Event Grid) to detect new files in the external stage. When a new file lands, Snowflake automatically updates the external table's metadata. This is eventually consistent - there can be a 1-2 minute delay.

Q: When should I use external tables instead of loading data into Snowflake?

Use external tables for rarely-queried data, data lake federation across Snowflake and Spark/Databricks, cost savings when you don't want to pay for Snowflake storage, or when the data must stay in its original location for compliance or multi-tool access.

Pranay Vatsal, Founder & CEO

Pranay Vatsal is the Founder & CEO of CelestInfo with deep expertise in Snowflake, data architecture, and building production-grade data systems for global enterprises.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.