Semantic Layer and dbt Metrics Guide

The Semantic Layer Problem: Why Your Metrics Don't Match and How dbt Metrics Can Help

Celestinfo Software Solutions Pvt. Ltd. Sep 18, 2025

Quick answer: Metrics don't match across teams because different people write different SQL to calculate the same thing. A semantic layer solves this by defining each metric once in a central location that all BI tools query. dbt's Semantic Layer (powered by MetricFlow) lets you define metrics in YAML within your dbt project, then query them from Looker, Tableau, Hex, or any tool connected to the dbt Cloud API. The hard part isn't the technology - it's getting finance, sales, and marketing to agree on the definitions.

Last updated: October 2025

The Metric Consistency Problem

Here's a scene that plays out in conference rooms every quarter: Sales says revenue is $18M. Finance says it's $17.2M. The CEO asks which number is correct. Everyone gets quiet.

The $800K gap isn't a data quality issue. Both numbers are "correct" - they're just calculated differently. Sales includes pending invoices. Finance excludes them until payment clears. Sales counts refunds against the month they were processed. Finance backdates them to the original transaction month. Same data warehouse, same tables, different SQL, different answers.

This happens because metrics live inside individual SQL queries, dashboard calculations, and spreadsheet formulas scattered across the organization. There's no single authoritative definition of "revenue" that everyone queries from. For organizations building their data models with star schemas, the metric layer sits on top of that foundation.


What a Semantic Layer Actually Is


A semantic layer is a centralized catalog of metric definitions that sits between your data warehouse and your BI tools. Instead of each analyst writing their own SQL to calculate MRR, churn rate, or net revenue retention, the semantic layer defines each metric exactly once. Every downstream tool - whether it's Looker, Tableau, Power BI, or a Python notebook - queries that single definition.

Think of it as a contract: "This is how we calculate revenue. Period. If you want revenue, you go through this layer. You don't write your own query."

The semantic layer doesn't replace your BI tool. It feeds your BI tool consistent, pre-defined metrics so the visualization layer doesn't also have to be the logic layer.


dbt's Approach: The Semantic Layer with MetricFlow


dbt Labs acquired Transform (the company behind MetricFlow) and integrated it into dbt Cloud as the dbt Semantic Layer. The architecture works like this:

  1. Define metrics in YAML within your dbt project, alongside your models
  2. MetricFlow compiles metric requests into optimized SQL at query time
  3. BI tools query metrics via the dbt Cloud Semantic Layer API (JDBC, GraphQL, or native integrations)
  4. The warehouse executes the SQL and returns results

The metric definitions live in your Git repository, go through pull request reviews, and are version-controlled just like your dbt models. For teams already using dbt with Snowflake, adding metric definitions is a natural extension of the existing workflow.


How MetricFlow Definitions Work

MetricFlow uses four core concepts:


Practical Example: Defining MRR, Churn Rate, and NRR

Here's what metric definitions look like in practice for three SaaS metrics that frequently cause cross-team arguments:

YAML - semantic_models/subscriptions.yml
semantic_models:
 - name: subscriptions
    defaults:
      agg_time_dimension: subscription_month
    model: ref('fct_subscriptions')
    entities:
 - name: subscription
        type: primary
        expr: subscription_id
 - name: customer
        type: foreign
        expr: customer_id
    dimensions:
 - name: subscription_month
        type: time
        type_params:
          time_granularity: month
 - name: plan_type
        type: categorical
 - name: status
        type: categorical
    measures:
 - name: mrr_amount
        agg: sum
        expr: monthly_recurring_revenue
 - name: churned_customers
        agg: sum_boolean
        expr: is_churned
 - name: total_customers
        agg: count_distinct
        expr: customer_id
YAML - metrics/saas_metrics.yml
metrics:
 - name: mrr
    description: "Monthly Recurring Revenue. Excludes one-time fees and usage overages."
    type: simple
    label: "MRR"
    type_params:
      measure: mrr_amount
    filter: |
      {{ Dimension('subscription__status') }} = 'active'

 - name: gross_churn_rate
    description: "Percentage of customers who cancelled in a given period."
    type: derived
    label: "Gross Churn Rate"
    type_params:
      expr: churned_customers / total_customers
      metrics:
 - name: churned_customers
          offset_window: 1 month
 - name: total_customers
          offset_window: 1 month

 - name: net_revenue_retention
    description: "NRR: revenue from existing customers including expansion minus churn."
    type: derived
    label: "Net Revenue Retention"
    type_params:
      expr: (current_mrr / prior_mrr) * 100
      metrics:
 - name: current_mrr
          filter: |
            {{ Dimension('subscription__status') }} = 'active'
 - name: prior_mrr
          offset_window: 1 month

Now when anyone queries "MRR" through any connected BI tool, they get the same definition: sum of monthly_recurring_revenue where status is active. No ambiguity. No arguments. One analyst can't accidentally include trial accounts while another excludes them.


BI Tool Integrations


The dbt Semantic Layer currently integrates with several BI and analytics tools through the dbt Cloud API:

The integration pattern is the same: the BI tool sends a metric query (which metric, which dimensions, which time grain) to the dbt Cloud API. MetricFlow compiles it to SQL. The warehouse runs it. Results come back.


Alternatives to dbt's Semantic Layer


dbt isn't the only option for centralized metric definitions:

Each approach has trade-offs. LookML is mature but vendor-locked. Cube is flexible but requires JavaScript/TypeScript skills. AtScale targets enterprises willing to pay for an additional infrastructure layer. dbt's advantage is that metric definitions live alongside your transformation code in the same Git repo.


Building Metrics the Right Way


Define Grain First

Before writing any metric YAML, answer: at what grain does this metric exist? MRR is monthly by definition. Daily active users is daily. Revenue per order is at the order grain. If you get the grain wrong, your aggregations will double-count or miss data.

Use Consistent Filters

If "active customers" means status = 'active' AND plan != 'trial', that filter must be identical everywhere the metric appears. The semantic layer enforces this. Document the filter logic in the metric's description field so anyone reviewing the YAML understands the business rule.

Document Business Rules

Every metric definition should include a plain-English description that a non-technical stakeholder can read. "MRR excludes one-time fees and usage overages" is more useful than just "sum of monthly_recurring_revenue." The description is the contract between data engineering and the business. For comprehensive approaches to dashboard design principles, metric documentation is a prerequisite.


The Organizational Challenge: Harder Than the Code


Here's the part that technical articles skip: getting finance, sales, and marketing to agree on metric definitions is 80% of the work. The YAML is the easy part.

Sales wants revenue to include committed contracts. Finance wants it to match GAAP recognition rules. Marketing wants to attribute revenue to campaigns. Each team has legitimate reasons for their definition. The semantic layer doesn't resolve the disagreement - it forces the disagreement to happen upfront, in a pull request review, instead of silently producing conflicting dashboards for months.

Start with 5-10 core metrics that the executive team cares about. Get written sign-off on definitions. Then expand. Trying to define 200 metrics on day one is a project management disaster. Implementing proper data access controls alongside your semantic layer ensures the right people see the right metrics.


When You Don't Need a Formal Semantic Layer


Not every team needs one. If you have:

Then a formal semantic layer adds overhead without proportional benefit. A well-maintained dbt project with clear documentation in your schema.yml files might be sufficient. The semantic layer becomes essential when multiple people write independent queries against the same data, or when multiple BI tools need to show the same numbers.


The Gotcha: dbt Cloud Requirement


The dbt Semantic Layer requires a dbt Cloud Team or Enterprise plan. It's not available in dbt Core. You can write metric YAML files in dbt Core for documentation purposes, but the API that lets BI tools query those metrics - the part that makes the semantic layer actually useful - is a paid dbt Cloud feature. For teams running dbt Core on a tight budget, this is a meaningful constraint. Evaluate whether Cube or a warehouse-native approach (like Snowflake's upcoming semantic layer features) fits better.


Key Takeaways

  • Metrics don't match because different people write different SQL - not because the data is wrong
  • A semantic layer defines each metric once, and every downstream tool queries that definition
  • dbt Semantic Layer uses MetricFlow to compile YAML metric definitions into warehouse SQL at query time
  • Metric definitions live in Git alongside your dbt models, go through pull request reviews, and are version-controlled
  • The organizational challenge (getting teams to agree on definitions) is harder than the technical implementation
  • Start with 5-10 core executive metrics; don't try to define everything on day one
  • dbt Semantic Layer requires dbt Cloud Team/Enterprise - it's not available in dbt Core
  • Small teams with a single SQL author and one BI tool can skip the semantic layer
CelestInfo
CelestInfo Engineering Team

Data engineering and cloud consulting that ships. We build production data platforms on Snowflake, Azure, AWS, and Power BI from Sagar Nagar, Vizag 530045.

Related Articles

Frequently Asked Questions

Q: What is a semantic layer in data analytics?

A semantic layer is a centralized definition of business metrics (revenue, churn, MRR, etc.) that sits between your data warehouse and your BI tools. Instead of each analyst writing their own SQL to calculate revenue, the semantic layer defines it once, and every tool queries that single definition. This eliminates the "my numbers don't match yours" problem.

Q: Is dbt Semantic Layer available in dbt Core?

No. The dbt Semantic Layer with MetricFlow integration requires a dbt Cloud Team or Enterprise plan. You can define metrics in dbt Core YAML files for documentation purposes, but the API that lets BI tools query those metrics directly is a dbt Cloud-only feature.

Q: How does MetricFlow work with dbt?

MetricFlow is the computation engine behind the dbt Semantic Layer. You define metrics, dimensions, entities, and time grains in YAML files within your dbt project. When a BI tool queries a metric through the dbt Cloud Semantic Layer API, MetricFlow compiles the request into optimized SQL and runs it against your warehouse.

Q: When do you not need a formal semantic layer?

If you have a small team (under 5 people) where one analyst writes all the SQL and everyone trusts the same dashboards, a formal semantic layer adds overhead without proportional benefit. The metric consistency problem becomes acute when multiple people write independent queries against the same data.

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.