Snowflake Iceberg Tables: Why Open Table Formats Are Changing Everything

Chakri

Published Mar 02, 2026 · Last updated Mar 2026

Data Engineer at CelestInfo. Specializing in cloud data platforms, ETL pipelines, and analytics solutions.

Celestinfo Software Solutions Pvt. Ltd. • Mar 04, 2026

Quick answer: Apache Iceberg is an open table format that stores your data as Parquet files with rich metadata, making it accessible to any compatible engine. Snowflake now offers Iceberg managed tables (Public Preview) with liquid clustering and predictive optimization for best-in-class performance. The Iceberg REST Catalog API provides read access (GA) and write access (Preview) from external engines like Databricks, Trino, and Amazon EMR. For most Snowflake-only workloads, native tables still win on simplicity. But if you need multi-engine access or want to avoid vendor lock-in, Iceberg tables are the path forward.

Last updated: March 2026

The Vendor Lock-In Problem That Iceberg Solves

For years, every data platform stored data in its own proprietary format. Snowflake has its FDN format. Databricks has Delta Lake. Google BigQuery uses Capacitor. Your data goes in, and it does not come out easily.

That was fine when teams used one platform for everything. But modern data architectures rarely work that way. Your analytics team runs SQL in Snowflake. Your data science team runs Spark on Databricks. Your ML engineers use Amazon EMR. And all of them need access to the same data.

Without open formats, you end up copying data between platforms. That means duplicate storage costs, synchronization headaches, and the inevitable moment when two copies disagree with each other. It is expensive, fragile, and frustrating.

Apache Iceberg fixes this. It is an open table format that sits on top of cloud object storage (S3, Azure Blob, GCS) and makes your data readable by any engine that speaks Iceberg. Write once, query from anywhere. That is the promise, and in 2026, it is actually delivering.

What Apache Iceberg Actually Is

At its core, Iceberg is a specification for how to organize data files and metadata on object storage. Think of it as a contract between the data producer and the data consumer.

The data itself is stored as Parquet files, which is already the industry standard for columnar data storage. What Iceberg adds is a sophisticated metadata layer that tracks:

Which files belong to which table snapshot (enabling time travel and rollback)
Column-level statistics (min/max values per file, enabling efficient file pruning)
Schema evolution history (adding, renaming, or reordering columns without rewriting data)
Partition evolution (changing how data is partitioned without rewriting existing files)

This metadata is what makes Iceberg powerful. A query engine can read the metadata, figure out exactly which Parquet files contain relevant data, and skip everything else. That is how you get good performance on tables with thousands of files.

How Snowflake Supports Iceberg

Snowflake's Iceberg support comes in two flavors, each designed for a different use case.

Iceberg Managed Tables (Public Preview)

This is the big one. With Iceberg managed tables, Snowflake manages the Iceberg catalog, handles file compaction, and runs its optimization engine, all while storing data in the open Iceberg format on your cloud storage.

What makes this interesting is that you get Snowflake's full optimization stack applied to open-format data. That includes:

Liquid clustering: Instead of defining static partition columns upfront, Snowflake continuously reorganizes data files based on actual query patterns. You specify clustering keys, and Snowflake handles the rest. No manual repartitioning. No maintenance windows.
Predictive optimization: Snowflake automatically identifies tables and queries that would benefit from clustering, compaction, or materialized views, and applies those optimizations proactively.
Full DML support: INSERT, UPDATE, DELETE, and MERGE all work on Iceberg managed tables, just like native tables.

The result is best-in-class price and performance on data that is not locked into Snowflake's proprietary format. Your data lives as standard Parquet and Iceberg metadata on your own cloud storage. If you ever want to query it from Databricks, Trino, or any other Iceberg-compatible engine, you can.

For teams running Snowflake as their primary platform but wanting an exit strategy or multi-engine flexibility, this is a significant development.

Externally Managed Iceberg Tables

The second option is for scenarios where another system manages the Iceberg catalog. Maybe Databricks writes data using Unity Catalog, and you want to query it from Snowflake without copying anything.

Snowflake can create external Iceberg tables that point to Iceberg data managed by another catalog. You define the table in Snowflake, tell it where the Iceberg metadata lives, and Snowflake reads the data directly. No data movement required.

This is read-only today for externally managed catalogs, but it solves a real problem. Your data science team can produce datasets in Databricks, and your analytics team can query them in Snowflake, without any ETL pipeline between them.

The Iceberg REST Catalog API

The Iceberg REST Catalog API is where interoperability gets practical. This API allows external engines to discover and access Iceberg tables managed by a catalog service.

Here is the current status:

Read access: Generally Available (GA). External engines like Databricks, Trino, Amazon EMR, and Apache Spark can read Iceberg tables through the REST Catalog API.
Write access: Preview. External engines can also write to Iceberg tables through the API, though this feature is still maturing.

What this means practically: you can have Snowflake manage your Iceberg tables while allowing Databricks to read from (and eventually write to) the same tables. The catalog API handles the coordination, ensuring consistent metadata across engines.

This is a big deal for enterprises running both Databricks and Snowflake. Instead of maintaining separate copies of the same data and building sync pipelines, you point both engines at the same Iceberg tables.

Delta Sharing for Iceberg (Private Preview)

Delta Sharing extends the data sharing model to Iceberg tables. Currently in Private Preview, it enables organizations to share Iceberg-formatted data securely across organizational boundaries.

The concept is straightforward. A data provider publishes an Iceberg table as a shared dataset. A data consumer accesses that shared dataset using any Iceberg-compatible engine. No data copying. No file transfers. The consumer reads directly from the provider's storage with managed access controls.

For industries where data sharing is critical, like financial services, healthcare, and supply chain, this opens up collaboration patterns that were previously impractical. You can share data without worrying about whether the consumer uses Snowflake, Databricks, or something else entirely.

Native Snowflake Tables vs Iceberg Tables: When to Use Each

This is the practical question every team asks. The honest answer: it depends on your architecture.

Use Native Snowflake Tables When:

Snowflake is your only query engine. Native tables are simpler to manage, and Snowflake's internal optimizations (micro-partitioning, search optimization service) are most mature for native format.
You want zero management overhead. Native tables require no thought about storage layout, compaction, or file organization. Snowflake handles everything.
Performance is the top priority. Native tables still have a slight performance edge for certain query patterns because Snowflake's engine is most deeply optimized for its own format.
You are using advanced Snowflake features. Some features like Dynamic Tables, Streams, and Tasks have the most mature support with native tables.

Use Iceberg Tables When:

Multiple engines need the same data. This is the primary use case. If Databricks, Spark, or Trino needs to query data that Snowflake also uses, Iceberg eliminates the need for data duplication.
You want to avoid vendor lock-in. With Iceberg, your data stays in open Parquet files on your own storage. Migrating to a different platform does not require a data extraction project.
Your data architecture includes a lakehouse pattern. Iceberg is the foundation of modern lakehouse architectures, providing ACID transactions, schema evolution, and time travel on data lake storage.
You are building a data mesh. Domain teams that publish data products can use Iceberg to make their datasets accessible to any consumer, regardless of the consumer's preferred query engine.

Most organizations we work with across our data engineering engagements start with native Snowflake tables and selectively adopt Iceberg for tables that need cross-platform access. There is no need to convert everything at once.

Performance: What to Expect

Let us be honest about performance. Snowflake's native format has a decades-long head start in optimization. Iceberg managed tables with liquid clustering and predictive optimization are closing the gap fast, but there are still differences to be aware of.

For read-heavy analytical queries, Iceberg managed tables perform within 10 to 20 percent of native tables in most benchmarks. The liquid clustering feature is especially effective here because it continuously optimizes file layout based on your actual query patterns.

For write-heavy workloads with frequent small inserts, native tables still have an edge. Iceberg's metadata management adds overhead per write operation that native tables avoid. If you are doing high-frequency CDC (change data capture) writes, test both approaches with your actual workload before committing.

For large batch writes and ETL operations, the performance difference is negligible. The bottleneck is typically the data processing and transformation logic, not the table format.

One area where Iceberg managed tables actually excel is in cost optimization. Because Iceberg data is stored as standard Parquet on your own cloud storage, you have more control over storage costs, lifecycle policies, and cross-region replication compared to Snowflake-managed storage.

Migration Considerations

Thinking about migrating existing native tables to Iceberg? Here is what to plan for.

Start with New Tables

The lowest-risk approach is to create new tables as Iceberg from the start. This avoids the complexity of data migration while letting your team build operational experience with the format.

Identify Multi-Engine Tables First

Survey your existing tables and identify which ones are accessed by or needed by systems outside Snowflake. Those are your migration candidates. Tables that only Snowflake touches can stay native indefinitely.

Test Performance With Your Workload

Create an Iceberg copy of a candidate table, run your typical queries against both versions, and compare. Every workload is different, and benchmarks only tell you so much. Your actual query patterns, data volumes, and concurrency levels determine real-world performance.

Plan for Operational Differences

Iceberg tables require awareness of file compaction, snapshot management, and metadata maintenance. Snowflake handles most of this automatically for managed tables, but your monitoring and alerting should track table health metrics like file count, snapshot count, and average file size.

Update Your Data Pipelines

If you use tools like dbt, ensure they support Iceberg table creation and management. Most modern ELT tools have added Iceberg support, but verify compatibility before migrating production pipelines.

The Bigger Picture: Why Open Formats Win

Iceberg is part of a larger trend in the data industry: the shift from proprietary platforms to open, interoperable components. This is not just philosophical. It has practical consequences for every data team.

When your data is in open formats, you negotiate from a position of strength. Cloud vendor pricing getting too aggressive? You can move. A better query engine comes along? You can adopt it without a six-month data migration. Your organization acquires a company that uses a different platform? Their data is already compatible.

Delta Lake and Apache Iceberg are the two dominant open table formats in 2026, with Iceberg gaining broader industry support because of its engine-agnostic design. Databricks created Delta Lake but has also added strong Iceberg compatibility through Unity Catalog. Snowflake, AWS, Google, and the broader open-source ecosystem have rallied around Iceberg as the neutral standard.

For a deeper look at how this fits into the broader modern data stack, our recent overview covers the full picture.

Key Takeaways

Apache Iceberg is an open table format that stores data as Parquet with rich metadata, enabling multi-engine access without data duplication.
Snowflake Iceberg managed tables (Public Preview) bring liquid clustering and predictive optimization to open-format data, delivering strong performance without vendor lock-in.
The Iceberg REST Catalog API provides read access (GA) and write access (Preview) for external engines like Databricks, Trino, and EMR.
Delta Sharing for Iceberg (Private Preview) enables cross-organization data sharing in open format.
Use native Snowflake tables for single-platform workloads where simplicity matters. Use Iceberg for multi-engine access and vendor flexibility.
Start with new tables as Iceberg, then selectively migrate existing tables that need cross-platform access.

Chakri, Intern

Chakri is an intern at CelestInfo working on Snowflake and cloud data platforms. He contributes to data engineering projects and writes technical content on modern data solutions.

Burning Questions About Snowflake Iceberg Tables

Quick answers to what teams ask us most

What are Snowflake Iceberg Tables?+

Snowflake Iceberg Tables are tables that use the Apache Iceberg open table format instead of Snowflake's proprietary FDN format. They store data as Parquet files with Iceberg metadata, making the data accessible to any engine that supports Iceberg, including Databricks, Trino, and Amazon EMR. Snowflake offers both managed Iceberg tables (where Snowflake handles compaction and optimization) and externally managed Iceberg tables (where the catalog is managed by another system).

Should I use native Snowflake tables or Iceberg tables?+

Use native Snowflake tables when your data lives entirely within the Snowflake ecosystem and you want maximum query performance with zero management overhead. Use Iceberg tables when you need multi-engine access (querying the same data from Databricks, Spark, or Trino), want to avoid vendor lock-in, or need to share data across platforms without copying it. For most single-platform Snowflake deployments, native tables are still the pragmatic choice.

Can Databricks and Snowflake both access the same Iceberg tables?+

Yes. This is one of the primary benefits of Iceberg. Both Snowflake and Databricks can read from and write to Iceberg tables stored in cloud object storage. The Iceberg REST Catalog API enables read access (GA) and write access (Preview) from external engines. This means you can write data with Spark on Databricks and query it from Snowflake, or vice versa, without duplicating data.

What is liquid clustering in Snowflake Iceberg tables?+

Liquid clustering is Snowflake's approach to data organization that replaces traditional static partitioning. Instead of defining fixed partition columns upfront, liquid clustering continuously optimizes data layout based on actual query patterns. This is available for Iceberg managed tables and delivers better performance without requiring you to predict and maintain partition schemes manually.

How does Delta Sharing work with Iceberg tables?+

Delta Sharing for Iceberg, currently in Private Preview, enables secure data sharing across organizations using the Iceberg format. It allows providers to share Iceberg tables with consumers who can read the data using any Iceberg-compatible engine. This extends the cross-platform data sharing model beyond proprietary formats and supports multi-cloud, multi-engine environments.