Setting Up Hevo Data with Snowflake: A No-Code Pipeline That Actually Works
Quick answer: Hevo Data is a managed ELT platform with 150+ connectors that loads data into Snowflake without code. Create a dedicated HEVO_LOADER role in Snowflake with minimal permissions, connect your sources through Hevo's web UI, and data starts flowing within hours. Watch your warehouse sizing during historical loads.
Last updated: October 2025
Hevo Data is a managed ELT platform that connects your data sources to Snowflake without requiring you to write extraction code. It sits in the same category as Fivetran and Airbyte - pre-built connectors, automatic schema mapping, change data capture. The difference is in pricing, connector quality, and a few operational details that matter more than you'd think.
This guide walks through connecting Hevo to Snowflake from scratch: setting up Snowflake permissions correctly, creating your first pipeline, understanding how Hevo handles schema changes, and knowing where the sharp edges are. We'll also compare it to Fivetran and Airbyte so you can make an informed choice.
Why Hevo Data
Hevo's value proposition is simple: you shouldn't need a data engineer to move data from Stripe to Snowflake. For a lot of teams, especially early-stage companies or small data teams, that's exactly right. You connect a source, point it at Snowflake, and Hevo handles the extraction schedule, schema mapping, incremental loads, and error recovery.
The specific features that matter:
- 150+ pre-built connectors for SaaS apps (Stripe, HubSpot, Zendesk, Salesforce), databases (PostgreSQL, MySQL, MongoDB), and file sources (S3, GCS, SFTP).
- Log-based CDC for databases. Hevo reads from PostgreSQL WAL logs and MySQL binlogs, so it captures all changes without polling. This is the right way to do CDC, though it requires specific database configuration.
- Automatic schema drift handling. When a source adds a column, Hevo adds it to the Snowflake destination table. When a type changes, Hevo pauses and notifies you.
- Event-based pricing. You pay per million events (rows loaded), not per connector or per monthly active row. For high-volume sources with low change rates, this can be significantly cheaper than Fivetran.
Step 1: Configure Snowflake for Hevo
Before you touch the Hevo dashboard, set up a dedicated Snowflake role for Hevo. Don't use ACCOUNTADMIN or SYSADMIN - give Hevo the minimum permissions it needs. Here's the SQL:
-- Create a dedicated role CREATE ROLE IF NOT EXISTS HEVO_LOADER; -- Create a dedicated warehouse (XSMALL is fine to start) CREATE WAREHOUSE IF NOT EXISTS HEVO_WH WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 120 AUTO_RESUME = TRUE MAX_CLUSTER_COUNT = 1; -- Grant warehouse access GRANT USAGE ON WAREHOUSE HEVO_WH TO ROLE HEVO_LOADER; -- Create a target database and schema CREATE DATABASE IF NOT EXISTS RAW_DATA; CREATE SCHEMA IF NOT EXISTS RAW_DATA.HEVO; -- Grant database and schema permissions GRANT USAGE ON DATABASE RAW_DATA TO ROLE HEVO_LOADER; GRANT USAGE ON SCHEMA RAW_DATA.HEVO TO ROLE HEVO_LOADER; GRANT CREATE TABLE ON SCHEMA RAW_DATA.HEVO TO ROLE HEVO_LOADER; GRANT SELECT ON ALL TABLES IN SCHEMA RAW_DATA.HEVO TO ROLE HEVO_LOADER; -- Create a dedicated user CREATE USER IF NOT EXISTS HEVO_USER PASSWORD = 'your-strong-password-here' DEFAULT_ROLE = HEVO_LOADER DEFAULT_WAREHOUSE = HEVO_WH; GRANT ROLE HEVO_LOADER TO USER HEVO_USER;
Two things to note: MAX_CLUSTER_COUNT = 1 prevents the warehouse from auto-scaling during Hevo's historical backfill, which can spike your credits unexpectedly. And AUTO_SUSPEND = 120 (2 minutes) is a reasonable balance between keeping the warehouse warm for frequent loads and not burning credits when idle.
Step 2: Create a Pipeline in Hevo
Hevo's pipeline creation is a wizard-style flow. You pick a source, configure credentials, select tables/objects, pick a destination, and map the schema. The whole thing takes 10-30 minutes depending on the source complexity.
Here's the general flow for a common source like PostgreSQL:
- Source configuration: Enter your PostgreSQL host, port, database, and credentials. Hevo tests the connection before proceeding.
- Replication mode: Choose between table-based replication (polls for changes using a timestamp column) or log-based CDC (reads WAL logs). Log-based is better for accuracy, but requires
wal_level=logicalon your Postgres instance. If you're on AWS RDS, this is a parameter group change that requires a reboot. - Table selection: Pick which tables to replicate. You can select all or choose specific ones. For each table, you specify the primary key and the load type (full load or incremental).
- Destination configuration: Enter the Snowflake account URL, the HEVO_USER credentials, and the target database/schema. Hevo verifies the connection and permissions.
- Schema mapping: Hevo auto-maps source columns to Snowflake types. VARCHAR maps to VARCHAR, INTEGER to NUMBER, TIMESTAMP to TIMESTAMP_NTZ. You can override mappings if needed, but the defaults are usually correct.
- Activate: Start the pipeline. Hevo begins with a historical load (full snapshot of existing data), then switches to incremental mode.
How Hevo Handles Schema Changes
Schema drift is one of those problems that seems minor until it breaks your pipeline at 2am. Hevo handles it in two ways:
- New columns: When the source adds a column, Hevo automatically adds it to the Snowflake destination table as a nullable column. Existing rows get NULL for the new column. This happens silently - no pipeline pause, no manual intervention. You'll see it in the Hevo activity log.
- Type changes: When a column's data type changes in the source (e.g., INTEGER to VARCHAR), Hevo pauses replication for that specific column and sends you an email notification. You review the change in the dashboard and either approve it (Hevo alters the column type in Snowflake) or reject it (Hevo drops the column from replication). This is the right behavior - silently changing a column type could break downstream dbt models or BI dashboards.
Dropped columns in the source are not dropped in Snowflake. The column stays, and new rows get NULL values. This is intentional - you might still need the historical data in that column.
Monitoring Pipeline Health
Hevo's dashboard shows pipeline status, ingestion lag, event counts, and error logs. Set up these alerts at minimum:
- Pipeline failure alert: Email and Slack notification when a pipeline stops due to an error. This catches things like expired API tokens, database permission revocations, and network issues.
- Ingestion lag alert: Notify when data hasn't been loaded for longer than your expected sync interval. If your pipeline syncs every 15 minutes and the lag exceeds 30 minutes, something's wrong.
- Schema change alert: The default email notifications for type changes are fine, but also route them to a Slack channel so the data team sees them in real time.
On the Snowflake side, monitor the HEVO_WH warehouse usage with Snowflake's WAREHOUSE_METERING_HISTORY view. This tells you exactly how many credits Hevo is consuming and whether you need to adjust the warehouse size or auto-suspend settings.
Hevo vs Fivetran vs Airbyte
All three tools solve the same problem. The differences are in pricing, connector quality, and operational model.
| Hevo Data | Fivetran | Airbyte | |
|---|---|---|---|
| Pricing model | Events (rows loaded) | Monthly Active Rows | Free (OSS) / rows (Cloud) |
| Connector count | 150+ | 300+ | 350+ (community) |
| Hosting | Fully managed | Fully managed | Self-hosted or Cloud |
| CDC support | Log-based (PG, MySQL) | Log-based (PG, MySQL, SQL Server) | Log-based (varies by connector) |
| Schema drift | Auto-add columns, pause on type changes | Auto-add columns, auto-widen types | Varies by connector |
| Transformations | Basic (included) | dbt Cloud (extra cost) | dbt Core integration |
| Best for | Cost-sensitive teams, high-volume sources | Enterprise teams, wide connector needs | Teams with engineering capacity |
Our take: If you're a startup or mid-size company with 5-15 data sources and a small data team, Hevo is often the right choice. It's cheaper than Fivetran for most workloads, and you don't need the engineering resources that Airbyte's self-hosted option demands. If you need 200+ connectors or enterprise-grade SLAs, Fivetran is the safer bet. If you have engineers who want full control and don't mind maintaining infrastructure, Airbyte's open-source version is hard to beat on flexibility.
Common Gotchas
- Historical loads can spike Snowflake credits. When you first connect a source, Hevo backfills all historical data. If the source has millions of rows, this initial load can run for hours and auto-scale your warehouse (if you didn't set
MAX_CLUSTER_COUNT = 1). Always cap the warehouse size and cluster count before starting a new pipeline. - PostgreSQL CDC requires wal_level=logical. This isn't enabled by default on most managed PostgreSQL services. On AWS RDS, you need to change the parameter group and reboot the instance. On Azure Database for PostgreSQL, it's a server parameter change. If your Postgres provider doesn't support
wal_level=logical, you'll have to fall back to table-based replication, which is less accurate for deletes and updates. - Hevo's free tier is limited to 1 million events/month. That sounds like a lot, but a single PostgreSQL table with 500K rows uses half your quota on the initial full load alone. The paid tiers start at a reasonable price point, but calculate your expected event volume before committing.
- Hevo adds metadata columns. Every table loaded by Hevo gets extra columns like
__hevo_id,__loaded_at, and__modified_at. These are useful for debugging but can surprise downstream consumers who don't expect them. Account for these in your dbt models.
Key Takeaways
- Create a dedicated
HEVO_LOADERrole in Snowflake with minimal permissions. Don't reuse SYSADMIN. - Set
MAX_CLUSTER_COUNT = 1on the Hevo warehouse to prevent credit spikes during historical loads. - Use log-based CDC for databases whenever possible. It's more accurate than table-based replication for capturing updates and deletes.
- Hevo auto-handles new columns but pauses on type changes. Set up Slack alerts so the team notices quickly.
- Compare pricing carefully. Hevo's event-based model can be cheaper than Fivetran for high-volume sources, but more expensive for sources with many small tables.
Related Articles
Frequently Asked Questions
Q: What is Hevo Data and how does it work with Snowflake?
Hevo Data is a managed, no-code ELT platform with 150+ pre-built connectors. It extracts data from sources like PostgreSQL, Stripe, and HubSpot, and loads it into Snowflake with automatic schema mapping, CDC, and schema drift handling. You configure pipelines through a web UI without writing code.
Q: How does Hevo Data compare to Fivetran?
Both are managed ELT platforms. Fivetran has more connectors and a longer track record. Hevo's pricing is event-based (rows loaded) rather than monthly active rows, which can be cheaper for high-volume, low-change-rate sources. Hevo includes basic transformations in its standard plan; Fivetran charges extra for dbt Cloud.
Q: What Snowflake permissions does Hevo Data need?
Create a dedicated HEVO_LOADER role with USAGE on the target warehouse, USAGE on the target database, USAGE and CREATE TABLE on the target schema. Don't give it ACCOUNTADMIN or SYSADMIN.
Q: Does Hevo Data handle schema changes automatically?
New columns are auto-added to Snowflake as nullable columns. Type changes trigger a pause and email notification so you can review before approving. Dropped source columns remain in Snowflake with NULL values for new rows.
