Building a Data Quality Framework: From Random Checks to Systematic Trust
Quick answer: Stop writing one-off SQL checks. Build systematic data quality using four layers: schema tests (dbt generic tests for nulls, uniqueness, referential integrity), custom business logic tests (row count ranges, metric thresholds), automated profiling (detect schema drift and distribution changes), and freshness monitoring (is the data actually up to date?). Focus tests on the 20–30 columns that drive business decisions, not every field in every table. The biggest threat to your quality program isn't bad data - it's false positive alerts that make your team stop paying attention.
Last updated: August 2025
Why Ad-Hoc Quality Checks Always Fail
Here's the pattern we see in every company without a quality framework: someone finds a data issue, writes a SQL query to check for it, runs it once, pastes the result in Slack, and moves on. That query never runs again. Two months later, the same issue resurfaces - or a worse variant of it - and the cycle repeats.
Ad-hoc checks fail because they're reactive, undocumented, and unmaintained. They exist in someone's query history or a random Slack thread. They don't run automatically, they don't alert anyone, and they don't block bad data from reaching dashboards. You need a system, not a habit.
The Six Dimensions of Data Quality
Before building checks, you need a shared vocabulary for what "quality" means. There are six standard dimensions:
- Completeness: Are all expected records present? Did today's pipeline load all 50K expected rows, or only 12K?
- Accuracy: Do values reflect reality? Does the revenue column match what's actually in Stripe?
- Consistency: Do related fields agree? If
status = 'shipped', isshipped_datepopulated? - Timeliness: Is the data fresh enough for its use case? A dashboard showing yesterday's data is fine for monthly reporting, unacceptable for fraud detection.
- Uniqueness: Are there duplicate records? One customer appearing three times inflates your user count by 200%.
- Validity: Do values conform to expected formats? Email addresses that don't contain "@", dates in the year 1900, negative order quantities.
Not all dimensions matter equally for every table. Revenue tables need accuracy and completeness above all. User tables need uniqueness. Event tables need timeliness. Prioritize based on business impact.
Building Systematic Checks
Layer 1: Schema-Level Tests (dbt Generic Tests)
If you're using dbt for transformations, you already have a testing framework built in. The four generic tests cover the basics:
models:
- name: fct_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: status
tests:
- accepted_values:
values: ['pending', 'processing', 'shipped', 'delivered', 'cancelled']
These run during every dbt build and block downstream models when they fail. That's the key difference from ad-hoc checks - failures stop the pipeline, not just generate a warning.
Layer 2: Custom Business Logic Tests
Generic tests catch structural issues. Custom tests catch business logic violations. In dbt, these are SQL files in your tests/ directory that return failing rows:
SELECT order_id, order_total, line_item_sum
FROM (
SELECT
o.order_id,
o.order_total,
SUM(li.amount) AS line_item_sum
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('fct_line_items') }} li ON o.order_id = li.order_id
GROUP BY 1, 2
) WHERE ABS(order_total - line_item_sum) > 0.01
Other custom tests worth implementing: daily row count within expected range (catches both missing and duplicated loads), metric values within N standard deviations of the 30-day average, and cross-table consistency checks.
Layer 3: Automated Profiling and Anomaly Detection
Tests catch known failure modes. Profiling catches unknown ones. Automated profiling monitors:
- Volume trends: Did the daily load drop by 40%? That's probably not normal.
- Null rate spikes: A column that's normally 2% null suddenly hitting 35% means something broke upstream.
- Value distribution shifts: If "status" values suddenly include a new category you didn't expect, that's either a schema change nobody communicated or a bug.
- Schema drift: New columns appearing, types changing, columns disappearing - these break downstream models silently.
Layer 4: Freshness Monitoring
The most overlooked dimension. Your dashboard shows "data updated today" but the underlying table was actually last loaded 3 days ago because the pipeline silently failed. dbt has built-in source freshness checks:
sources:
- name: raw_stripe
loaded_at_field: _loaded_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
Alerting Strategy: Don't Page at 2am for a Warning
Not all quality issues deserve the same response. Define severity levels:
- Critical (P0): Revenue data incorrect, pipeline fully down. Alert: PagerDuty, immediate response. Who gets paged: data engineer on-call.
- High (P1): Business-critical dashboard stale or showing wrong numbers. Alert: Slack #data-incidents channel. Response within 2 hours during business hours.
- Medium (P2): Non-critical data quality degradation, freshness warning. Alert: Slack #data-quality channel. Response within 1 business day.
- Low (P3): Cosmetic issues, minor anomalies. Alert: Weekly quality digest email. Response: next sprint.
The Data Quality Scorecard
Track quality metrics over time and report to stakeholders monthly. A simple scorecard includes: tests passing rate (target: 99%+), average data freshness by source, incidents by severity over time, and mean time to detection/resolution. This transforms quality from an invisible background task into something leadership can see, fund, and prioritize.
The Organizational Component
Tools alone don't create quality. You need:
- Data owners: Every critical table has a named person who's responsible for its quality. Not "the data team" - a specific person.
- SLAs: Documented expectations for freshness, completeness, and accuracy. Without SLAs, there's no shared definition of "good enough."
- Incident response: A clear process for when quality checks fail. Who investigates? How do you communicate impact to stakeholders? How do you do a post-mortem?
Tool Comparison
- dbt tests: Free, built into your dbt workflow. Great starting point but limited to test-time checks - no monitoring between runs.
- Great Expectations: Open-source Python library. Powerful, flexible, but requires engineering effort to deploy and maintain. Best for teams with Python expertise.
- Soda: Commercial product with a generous free tier. SQL-based checks with a nice UI. Good balance of power and usability.
- Monte Carlo: Commercial data observability platform. Automated anomaly detection, lineage, incident management. Best for enterprise teams with budget ($50K+/year).
- Elementary: dbt-native observability. Runs inside your existing dbt project. Good fit if you're already deep in the dbt ecosystem.
The False Positive Gotcha
False positives kill trust in quality alerts faster than actual quality issues. If your Slack channel fires 15 alerts a day and 13 of them are noise, people stop checking the channel within 2 weeks. Then the 2 real issues also get ignored. Treat every false positive as a bug: either tighten the threshold, add an exception, or remove the check. A small suite of reliable alerts is worth more than a comprehensive suite nobody trusts.
Key Takeaways
- Replace ad-hoc SQL checks with systematic, automated tests that run on every pipeline execution and block bad data from reaching dashboards.
- Build four layers: schema tests (dbt generic), business logic tests (custom SQL), automated profiling (anomaly detection), and freshness monitoring.
- Focus on the 20–30 columns that drive business decisions. Testing everything creates unmanageable alert fatigue.
- Define severity levels for alerts. Revenue data down = PagerDuty. Minor anomaly = weekly digest. Don't page anyone at 2am for a P3.
- False positives are the biggest threat to your quality program. Treat every one as a bug to fix.
- Assign data owners, define SLAs, and build incident response processes. Tools without organizational commitment are shelf-ware.
Related Articles
Frequently Asked Questions
What are the six dimensions of data quality?
The six dimensions are: completeness (are all expected records present?), accuracy (do values reflect reality?), consistency (do related fields agree across tables?), timeliness (is data fresh enough for its use case?), uniqueness (are there duplicate records?), and validity (do values conform to expected formats and ranges?). Focus on the dimensions that matter most for your specific business context.
How do dbt tests work for data quality?
dbt has four built-in generic tests: unique, not_null, accepted_values, and relationships. You declare them in your YAML schema files and they run as SQL queries against your warehouse during every dbt build. You can also write custom tests as SQL queries that return rows when they fail. Tests block downstream models from building if they fail.
Should I test every column in my data warehouse?
No. Testing 100% of columns creates alert fatigue and maintenance overhead that kills adoption. Focus on the 20-30 columns that drive business decisions: primary keys, foreign keys, revenue figures, dates, and status fields. A focused test suite that gets maintained is worth more than a comprehensive one that gets ignored.
What is the biggest risk to a data quality program?
False positives. If your quality alerts fire for non-issues, people stop trusting them within 2-3 weeks. Once the team starts ignoring alerts, real issues slip through unnoticed. Tune your thresholds carefully, start conservative, and treat every false positive as a bug that needs fixing.