Reverse ETL: Pushing Data from Your Warehouse Back to Salesforce, HubSpot, and Beyond
Quick answer: Reverse ETL syncs modeled data from your data warehouse (Snowflake, BigQuery, Redshift) back to operational tools like Salesforce, HubSpot, and Zendesk. You write a SQL query in the warehouse to define the data, map columns to destination fields, and schedule syncs. Tools like Census, Hightouch, and Polytomic handle the API integration. Don't build it yourself - API rate limits, error handling, and idempotency are harder than they look.
Last updated: October 2025
Introduction
Your data warehouse has the best version of your customer data. Product usage metrics, support ticket history, billing status, lead scores from your ML model - it's all there, cleaned and joined. But your sales team lives in Salesforce. Marketing lives in HubSpot. Support lives in Zendesk. They can't write SQL, and they shouldn't have to. Reverse ETL bridges this gap by pushing warehouse-computed data to the tools where teams actually do their work.
This isn't a new idea - teams have been exporting CSVs from warehouses and uploading them to Salesforce for years. What's new is that dedicated Reverse ETL tools automate the entire flow: query the warehouse, detect changes, handle API pagination and rate limits, retry failures, and log everything. For a broader view of how Reverse ETL fits into the integration landscape, see our 5 Data Integration Patterns guide.
Why Reverse ETL Exists
The modern data stack created an ironic problem. Fivetran and Hevo pull data into your warehouse from 200+ sources. dbt transforms it into clean, well-tested models. Your warehouse is the single source of truth. But all that work is trapped behind a SQL interface that only the data team uses.
Meanwhile, the people who make revenue-impacting decisions - sales reps, campaign managers, support agents - work in SaaS tools that have no direct connection to the warehouse. They're making decisions based on stale or incomplete data, or worse, manually maintaining their own spreadsheets. Reverse ETL solves this by pushing the warehouse's enriched data to where it's actually needed.
How Reverse ETL Works
The workflow has 4 steps:
- Define a model (SQL query). Write a SELECT statement in your warehouse that produces the rows and columns you want to sync. This query runs in Snowflake, BigQuery, or Redshift - wherever your modeled data lives.
- Map columns to destination fields. Tell the Reverse ETL tool which warehouse column maps to which field in the destination. For example:
warehouse.lead_scoremaps toSalesforce.Lead.Lead_Score__c. - Configure sync behavior. Choose how records are matched (by email? Salesforce ID? HubSpot contact ID?) and what happens on conflicts (upsert, create-only, update-only).
- Schedule syncs. Run the sync hourly, daily, or on a cron schedule. The tool diffs the current query results against the last sync, and only sends changed records to the destination.
-- This query powers a Census sync to Salesforce
-- It pushes lead scores and product usage data to Lead records
SELECT
l.salesforce_lead_id AS lead_id,
s.lead_score AS lead_score__c,
s.last_product_login AS last_product_login__c,
s.features_used_count AS features_used__c,
s.plan_name AS current_plan__c
FROM analytics.mart_lead_scoring s
JOIN raw_salesforce.leads l
ON s.email = l.email
WHERE s.lead_score IS NOT NULL;
Reverse ETL Tools
| Tool | Warehouse Support | Destinations | Differentiator |
|---|---|---|---|
| Census | Snowflake, BigQuery, Redshift, Databricks | 150+ (Salesforce, HubSpot, Google Ads, Braze, Zendesk) | Strong dbt integration, audience builder for marketing teams |
| Hightouch | Snowflake, BigQuery, Redshift, Databricks, PostgreSQL | 140+ (Salesforce, Marketo, Intercom, Amplitude) | Visual audience builder, customer studio for non-technical users |
| Polytomic | Snowflake, BigQuery, Redshift, PostgreSQL | 50+ (Salesforce, HubSpot, Notion, Airtable) | Bi-directional syncs, built-in data mapping UI |
| Fivetran (built-in) | Snowflake, BigQuery, Redshift | Growing list | Single platform for both ETL and Reverse ETL |
| Hevo (built-in) | Snowflake, BigQuery, Redshift | Growing list | No-code setup, part of existing Hevo pipeline |
Use Cases That Actually Work
1. Lead Scoring Pushed to Salesforce
Your data science team builds a lead scoring model in the warehouse using product signups, website behavior, company firmographics, and email engagement. The model runs nightly, scoring every lead. Reverse ETL pushes the score to Salesforce's Lead_Score__c field. Sales reps see the score right on the lead record when they open it - no dashboard required.
2. Customer Health Scores to Zendesk
Combine product usage data (DAU, feature adoption, error rates), billing status (MRR, payment failures), and support history (open tickets, CSAT scores) into a single health score in the warehouse. Push it to Zendesk as a custom field. When a support agent opens a ticket, they immediately see whether this is a healthy customer or one at risk of churning - and can adjust their response accordingly.
3. Product Usage Data to HubSpot
Marketing wants to segment users by product behavior: power users, trial users who haven't activated, users who stopped logging in. These segments live in the warehouse as dbt models. Reverse ETL pushes segment membership to HubSpot contact properties, enabling targeted email campaigns without marketing ever touching SQL.
4. Ad Audience Syncs to Google/Meta
Build a warehouse model that identifies high-value customers or lookalike seed audiences. Push those email lists to Google Ads Customer Match or Meta Custom Audiences. The sync runs daily, so your ad targeting always reflects the latest warehouse data instead of a stale CSV uploaded 3 weeks ago.
Building It Yourself vs Buying
The temptation to build your own Reverse ETL is strong. "It's just API calls, right?" Here's what you'll actually need to build:
- Change detection. You can't send all 500,000 contacts to Salesforce every hour. You need to diff the current query results against the last sync and only send changed records. This means storing sync state somewhere reliable.
- API rate limit handling. Salesforce standard edition caps at 15,000 API calls per 24-hour period. HubSpot's batch API handles 100 records per call with 100 calls per 10 seconds. You need backoff, retry, and queuing logic.
- Idempotency. If a sync fails halfway through, you need to resume without creating duplicate records. This means upsert logic based on external IDs, not blind inserts.
- Error handling. When Salesforce rejects 47 out of 10,000 records because of a validation rule, you need to log those failures, surface them to users, and make them retryable.
- Field mapping and type coercion. Your warehouse has
TIMESTAMP_NTZ. Salesforce expectsDateTimein ISO 8601. HubSpot wants epoch milliseconds. Every destination has its own type system. - Schema drift. When someone adds a new required field in Salesforce, your sync breaks. You need monitoring and alerting for schema changes in both source and destination.
For 1–2 simple destinations with small volumes, building might work. For anything more, Census or Hightouch pays for itself in engineering time saved within the first month. They've already solved these problems across 150+ integrations.
Data Modeling for Reverse ETL
Reverse ETL is only as good as the data you feed it. Build dedicated "sync-ready" mart tables in dbt with these principles:
- Clear primary keys that match the destination. If you're syncing to Salesforce, the primary key should be the Salesforce record ID (not your internal user ID). If matching by email, make sure emails are deduplicated and lowercased.
- Only the columns you're syncing. Don't expose your entire customer model. Include only the 5–10 fields that the destination actually needs. This reduces sync volume and API calls.
- Flat structure. No nested JSON, no arrays. SaaS APIs expect flat key-value pairs. If your warehouse model has nested structures, flatten them in the mart layer.
- dbt tests for sync readiness. Add
uniqueandnot_nulltests on primary keys. Addaccepted_valuestests on enum fields that map to picklists in the destination. A failing dbt test should block the sync.
-- models/marts/sync_salesforce_leads.sql
-- This model powers the Census sync to Salesforce Lead records
{{ config(materialized='table') }}
SELECT
sf.id AS salesforce_lead_id,
COALESCE(s.lead_score, 0) AS lead_score,
s.last_login_at AS last_product_login,
s.total_features_used AS features_used_count,
s.current_plan AS plan_name,
CURRENT_TIMESTAMP() AS _synced_at
FROM {{ ref('mart_lead_scoring') }} s
INNER JOIN {{ ref('stg_salesforce__leads') }} sf
ON LOWER(s.email) = LOWER(sf.email)
WHERE sf.is_converted = FALSE
AND s.lead_score IS NOT NULL
Common Mistakes
- Syncing too many fields. Every extra field increases API call size and the chance of validation errors. Start with 3–5 high-value fields (lead score, health score, last login) and expand based on actual business team requests.
- Syncing too frequently. Hourly syncs sound great until you hit Salesforce's 15,000 API call daily limit by 3 PM. Match sync frequency to how often the downstream team actually checks the data. For most use cases, daily or every 6 hours is plenty.
- No monitoring. Reverse ETL failures are silent by default. If the sync breaks on a Tuesday and nobody notices until the following Monday, sales reps have been working with week-old data. Set up Slack alerts for sync failures and record count anomalies.
- Syncing raw data instead of modeled data. Reverse ETL doesn't fix messy data - it amplifies it. If your warehouse model has duplicate customer records, you'll create duplicate records in Salesforce. Clean the data in dbt first.
API Rate Limits You'll Hit
| Destination | Rate Limit | Practical Impact |
|---|---|---|
| Salesforce (Standard) | 15,000 API calls / 24 hours | ~15,000 record updates per day using single-record API; bulk API handles 10,000 per batch |
| HubSpot | 100 records/call, 100 calls/10s | ~36,000 records/hour with batch API |
| Zendesk | 400 requests/minute (Team), 700 (Professional) | Use bulk API for updates over 100 records |
| Google Ads | 15,000 operations/mutate | Customer Match lists can take 24–48 hours to populate after upload |
| Intercom | 1,000 requests/minute | Contact updates are rate-limited per workspace |
The gotcha: Salesforce's 15,000 record/hour limit on standard edition is the one that bites most teams. If you're syncing 50,000 leads daily, you'll need Salesforce Enterprise edition (which raises the limit) or use the Salesforce Bulk API (which Census and Hightouch support natively). HubSpot's batch API maxes at 100 records per call - not per second, per call. For large contact lists, budget for sync windows of 30–60 minutes.
Key Takeaways
- Reverse ETL pushes warehouse-computed data (lead scores, health scores, segments) to SaaS tools where business teams work.
- The workflow is: SQL query in warehouse, column mapping, schedule sync. Tools like Census and Hightouch handle the API complexity.
- Don't build your own unless you have very simple requirements. API rate limits, idempotency, and error handling are harder than expected.
- Model your data for sync readiness in dbt: flat structure, clear primary keys, only the columns you need, and tests to ensure data quality.
- Start with 3–5 high-value fields and daily syncs. Expand based on what business teams actually use, not what seems comprehensive.
- Monitor sync failures actively. Silent failures mean stale data, which is worse than no data.
Related Articles
Q: What is Reverse ETL?
Reverse ETL is the process of syncing modeled, enriched data from your data warehouse (Snowflake, BigQuery, Redshift) back to operational SaaS tools like Salesforce, HubSpot, Zendesk, and Google Ads. It closes the loop between analytics and operations by putting warehouse-computed insights where business teams actually work.
Q: Should I build my own Reverse ETL or buy a tool?
Buy, unless you have very few destinations and simple sync requirements. Building your own means handling API rate limits, pagination, error retries, idempotency, field mapping, and schema drift for every destination. Tools like Census and Hightouch have already solved these problems across 150+ integrations.
Q: What are the API rate limits for common Reverse ETL destinations?
Salesforce standard edition allows 15,000 API calls per 24-hour period. HubSpot's batch API handles 100 records per call with 100 calls per 10 seconds. Google Ads allows 15,000 operations per mutate request. These limits constrain how much data you can sync and how frequently.
Q: How do I model data for Reverse ETL in dbt?
Build dedicated "sync-ready" mart tables in dbt with clear primary keys that match the destination's identifier (e.g., Salesforce record ID or email). Include only the columns you're syncing. Keep the model flat - no nested objects. Add dbt tests to ensure primary key uniqueness and required fields are never null.