Why a D2C Brand Abandoned Redshift for Snowflake (and Cut Their Data Team's Hours by 40%)
A Redshift cluster that needed constant babysitting, VACUUM jobs that collided with morning queries, and a 2-person data team spending nearly half their time on infrastructure. Here's how we migrated 3.2TB to Snowflake in 7 weeks.
The Challenge
This D2C brand sells kitchen products - everything from chef's knives to cast iron cookware - with about $28M in annual revenue split between their Shopify storefront and Amazon. They'd been running a Redshift cluster (dc2.large, 4 nodes) for about 2 years, and it was becoming the data team's biggest headache.
The nightly VACUUM operations were the worst offender. Redshift doesn't automatically reclaim space from deleted rows, so VACUUM had to run every night. The problem: it frequently wasn't done by 7am when the morning analytics queries kicked off. When VACUUM and analytical queries competed for the same cluster resources, dashboard load times went from 8 seconds to over 2 minutes. The marketing team started screenshotting dashboards at the end of each day because they couldn't trust that they'd load in the morning.
Weekend traffic spikes made things worse. This brand's peak sales period is Saturday and Sunday - cooking enthusiasts browsing while planning their week. The Redshift cluster was right-sized for weekday query loads but choked on weekend traffic. WLM (Workload Management) queue configuration was a constant guessing game: allocate too much memory to the ETL queue and analytics suffers, allocate too little and ingestion falls behind.
The 2-person data team - one data engineer, one analytics engineer - estimated they spent about 40% of their time on Redshift cluster management: tuning WLM, monitoring VACUUM, resizing nodes, troubleshooting sort key and dist key performance issues. The remaining 60% was split between building new models and answering ad-hoc questions from the business. Actual proactive analysis? Almost none.
What We Built
We migrated 3.2TB of data from Redshift to Snowflake using S3 as the transfer medium. The process was straightforward: UNLOAD from Redshift to S3 in Parquet format, then COPY INTO Snowflake from those same S3 files. We ran the migration in parallel across 8 schemas - the full data transfer took about 14 hours.
The SQL translation was the more interesting part. They had 67 Redshift SQL views, and we had to convert each one. Most Redshift-specific syntax (DISTKEY, SORTKEY, DISTSTYLE) simply gets dropped in Snowflake because Snowflake handles distribution automatically. We evaluated all 67 views for clustering key opportunities and found that only 3 tables actually benefited from explicit clustering keys - the order_items table (clustered on order_date), the page_views table (clustered on session_date), and the inventory_snapshots table (clustered on snapshot_date). Everything else performed fine without clustering because the tables weren't large enough for it to matter.
For compute, we set up auto-scaling virtual warehouses. Weekday analytics runs on an XS warehouse (1 credit/hour). On weekends, the warehouse auto-scales to Medium when concurrent query count exceeds 4. This replaced the Redshift approach of provisioning for peak and paying for that capacity 24/7. Fivetran handles data ingestion from Shopify, Amazon Seller Central, Google Analytics, and Facebook Ads - replacing the custom Python scripts they'd been running on an EC2 instance.
We also built a proper dbt project to replace the manual SQL view chain. The 67 views became 42 dbt models organized into staging, intermediate, and mart layers. dbt tests run automatically after each transformation, catching data quality issues before they hit dashboards. Looker connects directly to Snowflake for BI, replacing the Redshift-connected Mode instance they'd been using.
Results
Tech Stack
What We Learned
- Most Redshift DISTKEY/SORTKEY logic doesn't need a Snowflake equivalent. We analyzed all 67 views and only 3 tables benefited from Snowflake clustering keys. The instinct is to map every DISTKEY to a clustering key, but Snowflake's micro-partitioning handles most cases automatically. Over-clustering actually hurt performance on a few smaller tables during testing.
- S3-as-transfer-medium is the simplest Redshift-to-Snowflake path. We considered third-party migration tools, but UNLOAD-to-S3 and COPY-INTO-Snowflake is well-documented, predictable, and doesn't introduce another vendor. The 3.2TB transfer took 14 hours. Not fast, but reliable.
- Auto-scaling warehouses solve the weekend spike problem completely. On Redshift, the team spent hours every Friday tweaking WLM queues in anticipation of weekend traffic. On Snowflake, the warehouse scales from XS to Medium automatically when it needs to and scales back down when traffic drops. The team doesn't think about it at all.
- dbt tests catch what manual SQL reviews miss. Within the first week of running dbt tests post-migration, we caught a Fivetran sync issue where Amazon order refunds were being double-counted. The existing Redshift views had this bug too - it just hadn't been noticed because nobody was running automated tests.
Spending Too Much Time Babysitting Your Cluster?
VACUUM schedules, WLM tuning, resize operations - if your data team is spending more time on infrastructure than analysis, let's talk.
Start a Conversation