Migrating from Redshift to Snowflake: A Step-by-Step Playbook
Last updated: March 2025
Quick answer: UNLOAD your Redshift data to S3 as Parquet, COPY INTO Snowflake, translate Redshift-specific SQL (DISTKEY/SORTKEY/DISTSTYLE become unnecessary, GETDATE() becomes CURRENT_TIMESTAMP(), CONVERT becomes TO_CHAR/TO_DATE), run row-count and hash-based validation, then cutover. Plan 8-16 weeks. The biggest surprise will be type casting errors - Redshift's implicit casting is far more permissive than Snowflake's.
Why Teams Leave Redshift
Nobody migrates a data warehouse for fun. Teams leave Redshift because of real operational pain: cluster resizing requires downtime (or Elastic Resize, which still has limitations on node type changes), concurrency limits force users to wait in WLM queues during peak hours, and the VACUUM/ANALYZE maintenance cycle is a full-time job that never gets done. Snowflake's separation of storage and compute, automatic clustering, and per-query concurrency model solve these specific problems. That's the pitch. Here's how to actually do the migration.
Phase 1: Schema Audit
Before you move a single row, you need a complete inventory of what you're migrating. Run these queries against Redshift's system catalog:
-- List all tables with row counts and sizes
SELECT schemaname, tablename,
tbl_rows AS row_count,
size AS size_mb
FROM SVV_TABLE_INFO
ORDER BY size DESC;
-- Identify DISTKEY/SORTKEY/DISTSTYLE per table
SELECT tablename, diststyle, sortkey1, sortkey_num
FROM SVV_TABLE_INFO
WHERE schema = 'public';
-- Find views and their definitions
SELECT schemaname, viewname, definition
FROM pg_views
WHERE schemaname NOT IN ('pg_catalog','information_schema');
Pay special attention to: tables with DISTSTYLE KEY (these relied on co-located joins - Snowflake doesn't need this), tables with compound or interleaved sort keys (Snowflake uses automatic micro-partition pruning instead), and any views using Redshift-specific functions.
Phase 2: SQL Translation
This is where most of the work happens. Here are the Redshift SQL patterns that need to change:
DDL Changes
- DISTKEY / SORTKEY / DISTSTYLE: Remove entirely. Snowflake doesn't use distribution keys. Drop these clauses from all CREATE TABLE statements.
- ENCODE: Remove column compression encodings (e.g.,
ENCODE lzo,ENCODE zstd). Snowflake handles compression automatically. - IDENTITY columns: Replace
IDENTITY(1,1)with Snowflake'sAUTOINCREMENTorIDENTITY(same keyword, slightly different syntax).
Function Translations
-- GETDATE() -> CURRENT_TIMESTAMP() -- Redshift: SELECT GETDATE(); -- Snowflake: SELECT CURRENT_TIMESTAMP(); -- CONVERT(type, value) -> TO_CHAR / TO_DATE / CAST -- Redshift: SELECT CONVERT(VARCHAR, order_date, 112); -- Snowflake: SELECT TO_CHAR(order_date, 'YYYYMMDD'); -- DATEADD with different syntax -- Redshift: SELECT DATEADD(day, 7, GETDATE()); -- Snowflake: SELECT DATEADD(day, 7, CURRENT_TIMESTAMP()); -- DATEDIFF returns different precision -- Redshift: SELECT DATEDIFF(hour, start_ts, end_ts); -- Snowflake: SELECT DATEDIFF(hour, start_ts, end_ts); -- (Same syntax, but watch for timestamp precision differences) -- LISTAGG behavior differs -- Redshift: SELECT LISTAGG(name, ',') WITHIN GROUP (ORDER BY name) -- Snowflake: SELECT LISTAGG(name, ',') WITHIN GROUP (ORDER BY name) -- (Same syntax, but Snowflake has a 16MB limit per result) -- NVL2 is supported in both, but check edge cases -- ISNULL -> use NVL or COALESCE in Snowflake -- Approximate count distinct -- Redshift: SELECT APPROXIMATE COUNT(DISTINCT user_id) FROM events; -- Snowflake: SELECT APPROX_COUNT_DISTINCT(user_id) FROM events;
The biggest gotcha: Redshift's implicit type casting is far more permissive than Snowflake's. Redshift silently casts a VARCHAR to a NUMBER in a WHERE clause. Snowflake will throw Numeric value 'abc' is not recognized. You'll discover type mismatches you never knew existed in your data. Run your test queries against Snowflake early and fix casting issues before cutover. Use TRY_CAST() and TRY_TO_NUMBER() to handle dirty data gracefully.
Phase 3: Data Migration
The fastest path is S3 as the bridge:
- UNLOAD from Redshift to S3 as Parquet (not CSV - Parquet preserves types and compresses better):
SQL - Redshift UNLOAD
UNLOAD ('SELECT * FROM schema.table_name') TO 's3://migration-bucket/redshift-export/table_name/' IAM_ROLE 'arn:aws:iam::123456789:role/RedshiftUnload' FORMAT AS PARQUET ALLOWOVERWRITE PARALLEL ON MAXFILESIZE 256 MB; - Create Snowflake external stage pointing to the S3 bucket.
- COPY INTO from stage to Snowflake tables:
SQL - Snowflake COPY INTO
COPY INTO schema.table_name FROM @migration_stage/redshift-export/table_name/ FILE_FORMAT = (TYPE = PARQUET) MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
For ongoing replication during the migration period, consider AWS DMS (Database Migration Service) for CDC, or tools like Matillion or Talend for managed ELT. The Snowflake Migration Accelerator tool (available through Snowflake Professional Services) can automate much of the schema and SQL translation work. For details on loading data from Azure to Snowflake, see our Azure-to-Snowflake loading guide.
Phase 4: Performance Tuning in Snowflake
- Warehouse sizing: Start with X-Small for development. Use Small or Medium for production queries. Don't over-provision - Snowflake's auto-suspend (set to 60-300 seconds) and auto-resume mean you only pay when queries are running.
- Clustering keys: Most tables don't need them. Snowflake's automatic micro-partition pruning handles 90% of use cases. Add clustering keys only for very large tables (1TB+) where query patterns consistently filter on specific columns. Check pruning efficiency with
SYSTEM$CLUSTERING_INFORMATION('table_name'). - Result caching: Snowflake caches query results for 24 hours. If a dashboard runs the same query and the underlying data hasn't changed, the result returns instantly at zero compute cost. This alone often makes Snowflake feel faster than Redshift for BI workloads.
- WLM queue mapping: Redshift's WLM (Workload Management) queues map to separate Snowflake virtual warehouses. Create a
WH_ETLfor pipeline jobs and aWH_BIfor dashboard queries. They share storage but don't compete for compute. For more on this, see managing compute workloads for ETL vs analytics.
Phase 5: Testing and Validation
- Row count validation: Compare
SELECT COUNT(*)for every table between Redshift and Snowflake. This catches load failures and filtering issues. - Hash-based data comparison: For critical tables, compute a hash of key columns in both systems and compare. Use
MD5(CONCAT(col1, col2, col3))aggregated per partition to verify data integrity without comparing every row. - Query regression testing: Run your top 50 most-executed queries (from Redshift's
STL_QUERYlog) against both systems and compare results. Watch for subtle differences in NULL handling, date truncation, and floating-point precision. - BI dashboard validation: Have analysts open every production dashboard pointing at Snowflake and verify the numbers match. Don't skip this - dashboards often have implicit assumptions about query behavior that unit tests won't catch.
Phase 6: Cutover Planning
The cutover itself should be boring. If it's exciting, you didn't test enough. Here's the pattern that works:
- Run Redshift and Snowflake in parallel for 2-4 weeks with data syncing via CDC.
- Point non-critical dashboards at Snowflake first. Monitor for issues.
- Gradually migrate critical workloads. Keep Redshift as a fallback.
- Once all workloads are on Snowflake and stable for 1 week, decommission Redshift.
- Don't forget to cancel the Redshift Reserved Instance - we've seen teams pay for idle RI nodes for months after migration.
Key Takeaways
- UNLOAD to S3 as Parquet, COPY INTO Snowflake. It's the fastest and cheapest data migration path.
- Drop DISTKEY, SORTKEY, DISTSTYLE, and ENCODE from all DDL. Snowflake doesn't need them.
- Budget 30% of your SQL for translation work. The biggest pain point is implicit type casting differences.
- Use
TRY_CAST()in Snowflake to handle dirty data that Redshift silently accepted. - Map Redshift WLM queues to separate Snowflake warehouses. Set AUTO_SUSPEND to 60-300 seconds.
- Run parallel systems for 2-4 weeks before cutover. Test every dashboard, not just every query.
Frequently Asked Questions
Q: How long does a Redshift to Snowflake migration take?
Typical migrations take 8-16 weeks depending on the number of schemas, SQL complexity, and downstream dependencies. Schema translation and SQL conversion usually take 2-4 weeks. Data migration takes 1-2 weeks. Testing and validation take 3-6 weeks.
Q: Do I need to rewrite all my Redshift SQL for Snowflake?
Not all of it. Standard ANSI SQL works in both. But Redshift-specific syntax like DISTKEY, SORTKEY, DISTSTYLE, WLM queues, GETDATE(), CONVERT(), and certain LISTAGG behaviors need to be translated. Expect 15-30% of SQL statements to require changes.
Q: What is the best way to move data from Redshift to Snowflake?
UNLOAD from Redshift to S3 as Parquet files, then COPY INTO from S3 to Snowflake. This avoids data transfer fees and uses both platforms' native bulk load capabilities.
Q: Does Snowflake need DISTKEY and SORTKEY equivalents?
No. Snowflake uses automatic micro-partition pruning instead of distribution and sort keys. For very large tables (1TB+) with predictable query patterns, you can optionally add clustering keys, but most tables perform well without them.
