Implementing Slowly Changing Dimensions in Snowflake with dbt
Quick answer: Use dbt snapshots in Snowflake to implement SCD Type 2 with minimal code. Configure either a timestamp strategy (when your source has an updated_at column) or a check strategy (when it doesn't). dbt handles the MERGE logic, effective dating (dbt_valid_from/dbt_valid_to), and current record tracking automatically. Always use surrogate keys in fact tables -- never natural keys -- when joining to SCD dimensions.
Last updated: February 2025
What Slowly Changing Dimensions Actually Solve
Dimension attributes change. A customer moves from Dallas to Denver. A product gets reclassified from "Electronics" to "Smart Home." An employee transfers from the Sales department to Marketing. The question isn't whether these changes happen -- it's whether you need to remember the old values.
If a sales report from Q1 should show the customer in Dallas (where they were when the sale happened), you need SCD Type 2. If it should show Denver (their current location), Type 1 is fine. Getting this wrong means your historical reports silently rewrite history, and nobody notices until an auditor or an analyst catches the discrepancy. Here's how each type works, with a focus on implementing the most common ones in Snowflake using dbt.
The SCD Type Reference
Type 0: Retain Original
Never update the attribute. The value captured when the record was first loaded stays forever. Use this for attributes that genuinely don't change: birth date, original signup date, social security number. In practice, you implement Type 0 by simply not including the column in any update or snapshot logic.
Type 1: Overwrite
Replace the old value with the new value. No history retained. The dimension always shows the current state. Use this for corrections (typo in a name) or when historical context doesn't matter (customer's preferred language). In dbt, this is just a standard incremental model with a merge strategy.
Type 2: Add New Row with Effective Dates
Insert a new row for the changed record. The old row gets an end date; the new row gets a start date and a NULL end date (indicating it's current). This preserves full history. Use Type 2 for anything where you need to know "what was the value at a specific point in time?" -- customer geography, employee department, product category.
Type 2 is the workhorse of SCDs. It's also the one most teams implement incorrectly. dbt snapshots handle the heavy lifting, which is why we focus on them here.
Type 3: Add Previous Value Column
Add a "previous_value" column alongside the current value. You get one level of history. Use this when you only need the immediate prior state -- like tracking that a product was previously in the "Gadgets" category before being moved to "Smart Home." Limited but simple.
Type 6: Hybrid (1 + 2 + 3)
Combines Type 1 (current value in all rows), Type 2 (historical rows with effective dates), and Type 3 (previous value column). Every historical row gets the current value updated alongside its historical snapshot. This gives you the most flexibility for reporting but adds complexity to the loading process. Use it when analysts need both point-in-time accuracy and current-state analysis from the same table.
SCD Type 2 with dbt Snapshots
dbt's snapshot feature was built for SCD Type 2. You define a snapshot configuration, and dbt generates the MERGE statement that compares source data against the snapshot table, inserts new records, and closes changed records with effective dates.
Check Strategy vs. Timestamp Strategy
Timestamp strategy: Use when your source table has a reliable updated_at column. dbt compares the timestamp -- if it's newer than what's in the snapshot, the record is considered changed. This is faster because dbt only looks at one column.
Check strategy: Use when your source doesn't have a timestamp column. dbt hashes the specified columns and compares them. If any monitored column's value differs, the record is treated as changed. This is slower on large tables because it must compare every specified column for every row.
Complete dbt Snapshot Configuration
{% snapshot customer_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True
)
}}
SELECT
customer_id,
customer_name,
email,
city,
state,
segment,
updated_at
FROM {{ source('crm', 'customers') }}
{% endsnapshot %}
{% snapshot product_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='product_sku',
strategy='check',
check_cols=['product_name', 'category', 'subcategory', 'list_price']
)
}}
SELECT
product_sku,
product_name,
category,
subcategory,
list_price
FROM {{ source('catalog', 'products') }}
{% endsnapshot %}
Run snapshots with dbt snapshot. dbt creates and maintains the snapshot table automatically, adding these columns:
- dbt_scd_id: A unique identifier for each snapshot row
- dbt_updated_at: When this snapshot row was created or updated
- dbt_valid_from: When this version of the record became active
- dbt_valid_to: When this version was superseded (NULL means it's the current version)
Using Snapshots in Downstream Models
The whole point of SCD Type 2 is joining fact records to the correct dimension version. If an order was placed on 2026-01-15, and the customer moved from Dallas to Denver on 2026-02-01, that order should join to the Dallas version of the customer record.
SELECT
o.order_id,
o.order_date,
o.total_amount,
c.customer_name,
c.city,
c.state,
c.segment,
c.dbt_valid_from AS customer_version_start,
c.dbt_valid_to AS customer_version_end
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('customer_snapshot') }} c
ON o.customer_id = c.customer_id
AND o.order_date >= c.dbt_valid_from
AND (o.order_date < c.dbt_valid_to OR c.dbt_valid_to IS NULL)
The BETWEEN-style join on dbt_valid_from and dbt_valid_to ensures each order maps to the customer record that was active at the time of the order. The IS NULL check on dbt_valid_to handles the current record.
Why Surrogate Keys Matter with SCDs
If your fact table uses the natural key customer_id as a foreign key, you can join to the dimension -- but you'll get multiple rows back (one per historical version) unless you add the temporal join logic above. If your fact table uses a surrogate key like dbt_scd_id or a generated hash key, each fact row points to exactly one dimension version. No temporal join needed.
In star schema design, surrogate keys are the standard practice for exactly this reason. Never use natural keys as foreign keys in fact tables when your dimensions have SCD Type 2 history. You'll end up with duplicate fact rows or incorrect joins that nobody catches until the numbers don't add up.
Testing SCD Correctness
SCD Type 2 tables can develop subtle bugs that are hard to catch with standard data tests. Here are three tests you should run after every snapshot:
1. No Gaps in Effective Dates
For each business key, the dbt_valid_to of one version should equal the dbt_valid_from of the next version. Gaps mean you lost history; overlaps mean you'll get duplicate joins.
WITH versioned AS (
SELECT
customer_id,
dbt_valid_from,
dbt_valid_to,
LEAD(dbt_valid_from) OVER (
PARTITION BY customer_id ORDER BY dbt_valid_from
) AS next_valid_from
FROM {{ ref('customer_snapshot') }}
)
SELECT *
FROM versioned
WHERE dbt_valid_to IS NOT NULL
AND dbt_valid_to != next_valid_from
2. Exactly One Current Record Per Business Key
Every business key should have exactly one row where dbt_valid_to IS NULL. Zero means the entity was deleted (expected if you enabled invalidate_hard_deletes). More than one means a bug.
SELECT customer_id, COUNT(*) AS current_count
FROM {{ ref('customer_snapshot') }}
WHERE dbt_valid_to IS NULL
GROUP BY customer_id
HAVING COUNT(*) > 1
3. No Future-Dated Records
No dbt_valid_from should be in the future. If it is, something went wrong with the source timestamp or the snapshot execution timing.
Common Mistakes and Gotchas
- Running snapshots too infrequently: If you snapshot daily but your source changes 3 times in a day, you lose the intermediate states. Daily is the minimum for most use cases; hourly if your source is volatile.
- Unreliable unique key: dbt snapshots use MERGE under the hood. If your source table doesn't have a reliable unique key, the snapshot will produce duplicates. Composite keys work but must be truly unique.
- Not testing after schema changes: If the source table adds a column, your snapshot won't capture it until you update the SELECT. If a column is renamed, the snapshot breaks silently.
- Forgetting invalidate_hard_deletes: By default, dbt snapshots don't handle source deletions. If a customer is deleted from the source, the snapshot keeps the old record as current forever. Set
invalidate_hard_deletes=Trueto close deleted records. This is relevant if your data needs to reflect changes captured by Snowflake Streams and CDC patterns.
Key Takeaways
- SCD Type 2 is the most common requirement -- use dbt snapshots to implement it with minimal code
- Choose timestamp strategy when your source has a reliable updated_at column; use check strategy when it doesn't
- Always use surrogate keys (not natural keys) as foreign keys in fact tables joining to SCD dimensions
- Join fact tables to snapshots using BETWEEN logic on dbt_valid_from and dbt_valid_to
- Test for date gaps, overlaps, and exactly one current record per business key after every snapshot run
- Run snapshots at least daily -- more often if your source changes frequently within a day
- Enable invalidate_hard_deletes if source records can be deleted, otherwise deleted entities stay "current" forever
Frequently Asked Questions
Q: What is the difference between SCD Type 1 and Type 2?
SCD Type 1 overwrites the old value with the new value, keeping no history. Type 2 inserts a new row with the updated value and marks the old row with an end date, preserving full change history. Use Type 1 for corrections (fixing a typo in a name); use Type 2 for business-meaningful changes (customer moving to a new city).
Q: How do dbt snapshots implement SCD Type 2?
dbt snapshots use MERGE statements under the hood. On each run, they compare the current source data against the snapshot table. New rows get inserted with dbt_valid_from set to the current timestamp. Changed rows get their existing record closed (dbt_valid_to set to current timestamp) and a new record inserted. Deleted rows get closed. The result is a full Type 2 history table.
Q: Should I use check strategy or timestamp strategy for dbt snapshots?
Use timestamp strategy when your source has a reliable updated_at column -- it's more performant because dbt only compares timestamps. Use check strategy when your source lacks a timestamp column -- dbt compares all specified column values to detect changes. Check strategy is slower on large tables because it must hash and compare every monitored column.
Q: Why should I use surrogate keys instead of natural keys in fact tables with SCDs?
Natural keys point to the business entity but not to a specific version of that entity. If customer 12345 changes from New York to Chicago, a fact table with the natural key 12345 can only join to one version. A surrogate key unique per SCD version lets you join each fact row to the correct dimension version that was active when the transaction occurred.
