Implementing Slowly Changing Dimensions in Snowflake with dbt

Celestinfo Software Solutions Pvt. Ltd. Jan 23, 2025

Quick answer: Use dbt snapshots in Snowflake to implement SCD Type 2 with minimal code. Configure either a timestamp strategy (when your source has an updated_at column) or a check strategy (when it doesn't). dbt handles the MERGE logic, effective dating (dbt_valid_from/dbt_valid_to), and current record tracking automatically. Always use surrogate keys in fact tables -- never natural keys -- when joining to SCD dimensions.

Last updated: February 2025

What Slowly Changing Dimensions Actually Solve

Dimension attributes change. A customer moves from Dallas to Denver. A product gets reclassified from "Electronics" to "Smart Home." An employee transfers from the Sales department to Marketing. The question isn't whether these changes happen -- it's whether you need to remember the old values.

If a sales report from Q1 should show the customer in Dallas (where they were when the sale happened), you need SCD Type 2. If it should show Denver (their current location), Type 1 is fine. Getting this wrong means your historical reports silently rewrite history, and nobody notices until an auditor or an analyst catches the discrepancy. Here's how each type works, with a focus on implementing the most common ones in Snowflake using dbt.

The SCD Type Reference

Type 0: Retain Original

Never update the attribute. The value captured when the record was first loaded stays forever. Use this for attributes that genuinely don't change: birth date, original signup date, social security number. In practice, you implement Type 0 by simply not including the column in any update or snapshot logic.


Type 1: Overwrite

Replace the old value with the new value. No history retained. The dimension always shows the current state. Use this for corrections (typo in a name) or when historical context doesn't matter (customer's preferred language). In dbt, this is just a standard incremental model with a merge strategy.


Type 2: Add New Row with Effective Dates

Insert a new row for the changed record. The old row gets an end date; the new row gets a start date and a NULL end date (indicating it's current). This preserves full history. Use Type 2 for anything where you need to know "what was the value at a specific point in time?" -- customer geography, employee department, product category.

Type 2 is the workhorse of SCDs. It's also the one most teams implement incorrectly. dbt snapshots handle the heavy lifting, which is why we focus on them here.


Type 3: Add Previous Value Column

Add a "previous_value" column alongside the current value. You get one level of history. Use this when you only need the immediate prior state -- like tracking that a product was previously in the "Gadgets" category before being moved to "Smart Home." Limited but simple.


Type 6: Hybrid (1 + 2 + 3)

Combines Type 1 (current value in all rows), Type 2 (historical rows with effective dates), and Type 3 (previous value column). Every historical row gets the current value updated alongside its historical snapshot. This gives you the most flexibility for reporting but adds complexity to the loading process. Use it when analysts need both point-in-time accuracy and current-state analysis from the same table.

SCD Type 2 with dbt Snapshots

dbt's snapshot feature was built for SCD Type 2. You define a snapshot configuration, and dbt generates the MERGE statement that compares source data against the snapshot table, inserts new records, and closes changed records with effective dates.


Check Strategy vs. Timestamp Strategy

Timestamp strategy: Use when your source table has a reliable updated_at column. dbt compares the timestamp -- if it's newer than what's in the snapshot, the record is considered changed. This is faster because dbt only looks at one column.

Check strategy: Use when your source doesn't have a timestamp column. dbt hashes the specified columns and compares them. If any monitored column's value differs, the record is treated as changed. This is slower on large tables because it must compare every specified column for every row.


Complete dbt Snapshot Configuration

SQL -- snapshots/customer_snapshot.sql (Timestamp Strategy)
{% snapshot customer_snapshot %}

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at',
      invalidate_hard_deletes=True
    )
}}

SELECT
    customer_id,
    customer_name,
    email,
    city,
    state,
    segment,
    updated_at
FROM {{ source('crm', 'customers') }}

{% endsnapshot %}
SQL -- snapshots/product_snapshot.sql (Check Strategy)
{% snapshot product_snapshot %}

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='product_sku',
      strategy='check',
      check_cols=['product_name', 'category', 'subcategory', 'list_price']
    )
}}

SELECT
    product_sku,
    product_name,
    category,
    subcategory,
    list_price
FROM {{ source('catalog', 'products') }}

{% endsnapshot %}

Run snapshots with dbt snapshot. dbt creates and maintains the snapshot table automatically, adding these columns:

Using Snapshots in Downstream Models

The whole point of SCD Type 2 is joining fact records to the correct dimension version. If an order was placed on 2026-01-15, and the customer moved from Dallas to Denver on 2026-02-01, that order should join to the Dallas version of the customer record.


SQL -- models/marts/fct_orders.sql
SELECT
    o.order_id,
    o.order_date,
    o.total_amount,
    c.customer_name,
    c.city,
    c.state,
    c.segment,
    c.dbt_valid_from AS customer_version_start,
    c.dbt_valid_to   AS customer_version_end
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('customer_snapshot') }} c
    ON o.customer_id = c.customer_id
    AND o.order_date >= c.dbt_valid_from
    AND (o.order_date < c.dbt_valid_to OR c.dbt_valid_to IS NULL)

The BETWEEN-style join on dbt_valid_from and dbt_valid_to ensures each order maps to the customer record that was active at the time of the order. The IS NULL check on dbt_valid_to handles the current record.

Why Surrogate Keys Matter with SCDs

If your fact table uses the natural key customer_id as a foreign key, you can join to the dimension -- but you'll get multiple rows back (one per historical version) unless you add the temporal join logic above. If your fact table uses a surrogate key like dbt_scd_id or a generated hash key, each fact row points to exactly one dimension version. No temporal join needed.

In star schema design, surrogate keys are the standard practice for exactly this reason. Never use natural keys as foreign keys in fact tables when your dimensions have SCD Type 2 history. You'll end up with duplicate fact rows or incorrect joins that nobody catches until the numbers don't add up.

Testing SCD Correctness

SCD Type 2 tables can develop subtle bugs that are hard to catch with standard data tests. Here are three tests you should run after every snapshot:


1. No Gaps in Effective Dates

For each business key, the dbt_valid_to of one version should equal the dbt_valid_from of the next version. Gaps mean you lost history; overlaps mean you'll get duplicate joins.

SQL -- tests/no_date_gaps_customer.sql
WITH versioned AS (
    SELECT
        customer_id,
        dbt_valid_from,
        dbt_valid_to,
        LEAD(dbt_valid_from) OVER (
            PARTITION BY customer_id ORDER BY dbt_valid_from
        ) AS next_valid_from
    FROM {{ ref('customer_snapshot') }}
)
SELECT *
FROM versioned
WHERE dbt_valid_to IS NOT NULL
  AND dbt_valid_to != next_valid_from

2. Exactly One Current Record Per Business Key

Every business key should have exactly one row where dbt_valid_to IS NULL. Zero means the entity was deleted (expected if you enabled invalidate_hard_deletes). More than one means a bug.

SQL -- tests/one_current_record.sql
SELECT customer_id, COUNT(*) AS current_count
FROM {{ ref('customer_snapshot') }}
WHERE dbt_valid_to IS NULL
GROUP BY customer_id
HAVING COUNT(*) > 1

3. No Future-Dated Records

No dbt_valid_from should be in the future. If it is, something went wrong with the source timestamp or the snapshot execution timing.

Common Mistakes and Gotchas

Key Takeaways

Frequently Asked Questions

Q: What is the difference between SCD Type 1 and Type 2?

SCD Type 1 overwrites the old value with the new value, keeping no history. Type 2 inserts a new row with the updated value and marks the old row with an end date, preserving full change history. Use Type 1 for corrections (fixing a typo in a name); use Type 2 for business-meaningful changes (customer moving to a new city).

Q: How do dbt snapshots implement SCD Type 2?

dbt snapshots use MERGE statements under the hood. On each run, they compare the current source data against the snapshot table. New rows get inserted with dbt_valid_from set to the current timestamp. Changed rows get their existing record closed (dbt_valid_to set to current timestamp) and a new record inserted. Deleted rows get closed. The result is a full Type 2 history table.

Q: Should I use check strategy or timestamp strategy for dbt snapshots?

Use timestamp strategy when your source has a reliable updated_at column -- it's more performant because dbt only compares timestamps. Use check strategy when your source lacks a timestamp column -- dbt compares all specified column values to detect changes. Check strategy is slower on large tables because it must hash and compare every monitored column.

Q: Why should I use surrogate keys instead of natural keys in fact tables with SCDs?

Natural keys point to the business entity but not to a specific version of that entity. If customer 12345 changes from New York to Chicago, a fact table with the natural key 12345 can only join to one version. A surrogate key unique per SCD version lets you join each fact row to the correct dimension version that was active when the transaction occurred.

Ameer, Data Governance Specialist

Ameer is a Data Governance Specialist at CelestInfo focused on RBAC, compliance frameworks, data security, and access control across cloud data platforms.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.