The Semantic Layer Problem: Why Your Metrics Don't Match and How dbt Metrics Can Help
Quick answer: Metrics don't match across teams because different people write different SQL to calculate the same thing. A semantic layer solves this by defining each metric once in a central location that all BI tools query. dbt's Semantic Layer (powered by MetricFlow) lets you define metrics in YAML within your dbt project, then query them from Looker, Tableau, Hex, or any tool connected to the dbt Cloud API. The hard part isn't the technology - it's getting finance, sales, and marketing to agree on the definitions.
Last updated: October 2025
The Metric Consistency Problem
Here's a scene that plays out in conference rooms every quarter: Sales says revenue is $18M. Finance says it's $17.2M. The CEO asks which number is correct. Everyone gets quiet.
The $800K gap isn't a data quality issue. Both numbers are "correct" - they're just calculated differently. Sales includes pending invoices. Finance excludes them until payment clears. Sales counts refunds against the month they were processed. Finance backdates them to the original transaction month. Same data warehouse, same tables, different SQL, different answers.
This happens because metrics live inside individual SQL queries, dashboard calculations, and spreadsheet formulas scattered across the organization. There's no single authoritative definition of "revenue" that everyone queries from. For organizations building their data models with star schemas, the metric layer sits on top of that foundation.
What a Semantic Layer Actually Is
A semantic layer is a centralized catalog of metric definitions that sits between your data warehouse and your BI tools. Instead of each analyst writing their own SQL to calculate MRR, churn rate, or net revenue retention, the semantic layer defines each metric exactly once. Every downstream tool - whether it's Looker, Tableau, Power BI, or a Python notebook - queries that single definition.
Think of it as a contract: "This is how we calculate revenue. Period. If you want revenue, you go through this layer. You don't write your own query."
The semantic layer doesn't replace your BI tool. It feeds your BI tool consistent, pre-defined metrics so the visualization layer doesn't also have to be the logic layer.
dbt's Approach: The Semantic Layer with MetricFlow
dbt Labs acquired Transform (the company behind MetricFlow) and integrated it into dbt Cloud as the dbt Semantic Layer. The architecture works like this:
- Define metrics in YAML within your dbt project, alongside your models
- MetricFlow compiles metric requests into optimized SQL at query time
- BI tools query metrics via the dbt Cloud Semantic Layer API (JDBC, GraphQL, or native integrations)
- The warehouse executes the SQL and returns results
The metric definitions live in your Git repository, go through pull request reviews, and are version-controlled just like your dbt models. For teams already using dbt with Snowflake, adding metric definitions is a natural extension of the existing workflow.
How MetricFlow Definitions Work
MetricFlow uses four core concepts:
- Semantic models - map to your dbt models and declare which columns are dimensions, entities, or measures
- Metrics - define business calculations using measures from semantic models (e.g., revenue = SUM of amount where status = 'paid')
- Entities - join keys that connect semantic models (e.g., customer_id links orders to customers)
- Time grains - control how metrics aggregate over time (daily, weekly, monthly, yearly)
Practical Example: Defining MRR, Churn Rate, and NRR
Here's what metric definitions look like in practice for three SaaS metrics that frequently cause cross-team arguments:
semantic_models:
- name: subscriptions
defaults:
agg_time_dimension: subscription_month
model: ref('fct_subscriptions')
entities:
- name: subscription
type: primary
expr: subscription_id
- name: customer
type: foreign
expr: customer_id
dimensions:
- name: subscription_month
type: time
type_params:
time_granularity: month
- name: plan_type
type: categorical
- name: status
type: categorical
measures:
- name: mrr_amount
agg: sum
expr: monthly_recurring_revenue
- name: churned_customers
agg: sum_boolean
expr: is_churned
- name: total_customers
agg: count_distinct
expr: customer_id
metrics:
- name: mrr
description: "Monthly Recurring Revenue. Excludes one-time fees and usage overages."
type: simple
label: "MRR"
type_params:
measure: mrr_amount
filter: |
{{ Dimension('subscription__status') }} = 'active'
- name: gross_churn_rate
description: "Percentage of customers who cancelled in a given period."
type: derived
label: "Gross Churn Rate"
type_params:
expr: churned_customers / total_customers
metrics:
- name: churned_customers
offset_window: 1 month
- name: total_customers
offset_window: 1 month
- name: net_revenue_retention
description: "NRR: revenue from existing customers including expansion minus churn."
type: derived
label: "Net Revenue Retention"
type_params:
expr: (current_mrr / prior_mrr) * 100
metrics:
- name: current_mrr
filter: |
{{ Dimension('subscription__status') }} = 'active'
- name: prior_mrr
offset_window: 1 month
Now when anyone queries "MRR" through any connected BI tool, they get the same definition: sum of monthly_recurring_revenue where status is active. No ambiguity. No arguments. One analyst can't accidentally include trial accounts while another excludes them.
BI Tool Integrations
The dbt Semantic Layer currently integrates with several BI and analytics tools through the dbt Cloud API:
- Looker - native integration that lets Looker Explores query dbt metrics directly
- Tableau - connects via JDBC to query metrics as if they were database tables
- Hex - native Python/SQL notebooks with dbt metric querying built in
- Mode - SQL-based analytics with Semantic Layer connectivity
- Google Sheets - for the stakeholders who will always prefer spreadsheets
The integration pattern is the same: the BI tool sends a metric query (which metric, which dimensions, which time grain) to the dbt Cloud API. MetricFlow compiles it to SQL. The warehouse runs it. Results come back.
Alternatives to dbt's Semantic Layer
dbt isn't the only option for centralized metric definitions:
- Looker's LookML - the original code-based semantic layer. Powerful, but it locks metric definitions inside the Looker ecosystem. If you leave Looker, you leave your metric layer behind.
- Cube (formerly Cube.js) - open-source headless BI with its own semantic layer that works with any front-end. Strong API-first approach.
- AtScale - enterprise semantic layer that presents metrics as virtual OLAP cubes. Works with Excel, Power BI, and Tableau without changing end-user workflows.
Each approach has trade-offs. LookML is mature but vendor-locked. Cube is flexible but requires JavaScript/TypeScript skills. AtScale targets enterprises willing to pay for an additional infrastructure layer. dbt's advantage is that metric definitions live alongside your transformation code in the same Git repo.
Building Metrics the Right Way
Define Grain First
Before writing any metric YAML, answer: at what grain does this metric exist? MRR is monthly by definition. Daily active users is daily. Revenue per order is at the order grain. If you get the grain wrong, your aggregations will double-count or miss data.
Use Consistent Filters
If "active customers" means status = 'active' AND plan != 'trial', that filter must be identical everywhere the metric appears. The semantic layer enforces this. Document the filter logic in the metric's description field so anyone reviewing the YAML understands the business rule.
Document Business Rules
Every metric definition should include a plain-English description that a non-technical stakeholder can read. "MRR excludes one-time fees and usage overages" is more useful than just "sum of monthly_recurring_revenue." The description is the contract between data engineering and the business. For comprehensive approaches to dashboard design principles, metric documentation is a prerequisite.
The Organizational Challenge: Harder Than the Code
Here's the part that technical articles skip: getting finance, sales, and marketing to agree on metric definitions is 80% of the work. The YAML is the easy part.
Sales wants revenue to include committed contracts. Finance wants it to match GAAP recognition rules. Marketing wants to attribute revenue to campaigns. Each team has legitimate reasons for their definition. The semantic layer doesn't resolve the disagreement - it forces the disagreement to happen upfront, in a pull request review, instead of silently producing conflicting dashboards for months.
Start with 5-10 core metrics that the executive team cares about. Get written sign-off on definitions. Then expand. Trying to define 200 metrics on day one is a project management disaster. Implementing proper data access controls alongside your semantic layer ensures the right people see the right metrics.
When You Don't Need a Formal Semantic Layer
Not every team needs one. If you have:
- A small team (fewer than 5 people) where one person writes all the SQL
- A single BI tool with all metric logic defined inside it
- Simple metrics that don't require cross-team agreement
- No history of "our numbers don't match" conversations
Then a formal semantic layer adds overhead without proportional benefit. A well-maintained dbt project with clear documentation in your schema.yml files might be sufficient. The semantic layer becomes essential when multiple people write independent queries against the same data, or when multiple BI tools need to show the same numbers.
The Gotcha: dbt Cloud Requirement
The dbt Semantic Layer requires a dbt Cloud Team or Enterprise plan. It's not available in dbt Core. You can write metric YAML files in dbt Core for documentation purposes, but the API that lets BI tools query those metrics - the part that makes the semantic layer actually useful - is a paid dbt Cloud feature. For teams running dbt Core on a tight budget, this is a meaningful constraint. Evaluate whether Cube or a warehouse-native approach (like Snowflake's upcoming semantic layer features) fits better.
Key Takeaways
- Metrics don't match because different people write different SQL - not because the data is wrong
- A semantic layer defines each metric once, and every downstream tool queries that definition
- dbt Semantic Layer uses MetricFlow to compile YAML metric definitions into warehouse SQL at query time
- Metric definitions live in Git alongside your dbt models, go through pull request reviews, and are version-controlled
- The organizational challenge (getting teams to agree on definitions) is harder than the technical implementation
- Start with 5-10 core executive metrics; don't try to define everything on day one
- dbt Semantic Layer requires dbt Cloud Team/Enterprise - it's not available in dbt Core
- Small teams with a single SQL author and one BI tool can skip the semantic layer
Data engineering and cloud consulting that ships. We build production data platforms on Snowflake, Azure, AWS, and Power BI from Sagar Nagar, Vizag 530045.
Related Articles
Frequently Asked Questions
Q: What is a semantic layer in data analytics?
A semantic layer is a centralized definition of business metrics (revenue, churn, MRR, etc.) that sits between your data warehouse and your BI tools. Instead of each analyst writing their own SQL to calculate revenue, the semantic layer defines it once, and every tool queries that single definition. This eliminates the "my numbers don't match yours" problem.
Q: Is dbt Semantic Layer available in dbt Core?
No. The dbt Semantic Layer with MetricFlow integration requires a dbt Cloud Team or Enterprise plan. You can define metrics in dbt Core YAML files for documentation purposes, but the API that lets BI tools query those metrics directly is a dbt Cloud-only feature.
Q: How does MetricFlow work with dbt?
MetricFlow is the computation engine behind the dbt Semantic Layer. You define metrics, dimensions, entities, and time grains in YAML files within your dbt project. When a BI tool queries a metric through the dbt Cloud Semantic Layer API, MetricFlow compiles the request into optimized SQL and runs it against your warehouse.
Q: When do you not need a formal semantic layer?
If you have a small team (under 5 people) where one analyst writes all the SQL and everyone trusts the same dashboards, a formal semantic layer adds overhead without proportional benefit. The metric consistency problem becomes acute when multiple people write independent queries against the same data.