Databricks Unity Catalog: Centralized Governance for Your Lakehouse

Databricks Unity Catalog: Centralized Governance for Your Lakehouse

Celestinfo Software Solutions Pvt. Ltd. Sep 04, 2025

Quick answer: Unity Catalog is Databricks' centralized governance layer that replaces per-workspace Hive Metastores with a single control plane. It gives you a 3-level namespace (catalog.schema.table), cross-workspace access control via SQL GRANT statements, automatic column-level lineage, and built-in Delta Sharing for external data distribution. You'll need a Premium or Enterprise Databricks plan to use it.

Last updated: September 2025

Introduction

Most Databricks deployments start simple: one workspace, one Hive Metastore, a handful of engineers who know where everything lives. Then the second workspace appears. Then the third. Suddenly the data science team can't access the tables the data engineering team built, nobody knows which tables contain PII, and the security audit turns into a scavenger hunt across 6 separate metastores. That's the problem Unity Catalog was built to solve.


Unity Catalog provides a single governance layer across all your Databricks workspaces. Instead of managing access controls, metadata, and lineage separately in each workspace, you define everything once and it applies everywhere. This guide covers how Unity Catalog works, how to set it up, and the gotchas you'll hit during migration.


The Problem: Fragmented Governance


Without Unity Catalog, every Databricks workspace gets its own Hive Metastore. That creates three specific headaches:



Unity Catalog replaces this fragmented model with a centralized metastore that all workspaces share. One place to define access policies. One place to track lineage. One audit log.


The 3-Level Namespace


Unity Catalog introduces a 3-level naming convention that replaces the flat database.table structure in Hive Metastore:


Unity Catalog Namespace
catalog.schema.table

-- Examples:
production.sales.orders
staging.marketing.campaigns
sandbox.data_science.churn_predictions


This hierarchy lets you grant access at any level. Give a team access to an entire catalog, a single schema, or a specific table. Permissions cascade downward: a GRANT on a catalog applies to every schema and table inside it.


Setting Up Unity Catalog


Setup involves 4 steps: creating a metastore, assigning it to workspaces, configuring storage credentials, and defining external locations.


Step 1: Create the Metastore


A Unity Catalog metastore is the top-level container for all your metadata. You create one metastore per region (Databricks requires the metastore and workspaces to be in the same cloud region). The metastore needs a root storage location in your cloud account - this is where managed tables store their data by default.


You create the metastore in the Databricks Account Console (not in a workspace). Only account admins can do this.


Step 2: Assign Workspaces


After creating the metastore, assign it to one or more workspaces. Each workspace can only be assigned to one metastore. Once assigned, users in that workspace can access any catalog they've been granted permissions on - regardless of which workspace created the catalog.


Step 3: Configure Storage Credentials


Storage credentials tell Unity Catalog how to authenticate with your cloud storage (S3 buckets, ADLS containers, or GCS buckets). You create an IAM role (AWS), service principal (Azure), or service account (GCP) and register it as a storage credential in Unity Catalog.


SQL - Create Storage Credential
CREATE STORAGE CREDENTIAL my_s3_credential
WITH (
  AWS_IAM_ROLE = 'arn:aws:iam::123456789012:role/unity-catalog-role'
);

-- Verify it works
VALIDATE STORAGE CREDENTIAL my_s3_credential
  ON URL 's3://my-data-bucket/unity-catalog/';

Step 4: Define External Locations


External locations map a storage credential to a specific cloud storage path. This is how you control which paths Unity Catalog can read from and write to. External tables must reference a registered external location - they can't point to arbitrary paths. This is a deliberate security constraint.


SQL - Create External Location
CREATE EXTERNAL LOCATION my_data_lake
  URL 's3://my-data-bucket/lakehouse/'
  WITH (STORAGE CREDENTIAL my_s3_credential);

-- Grant usage to a group
GRANT READ FILES ON EXTERNAL LOCATION my_data_lake
  TO `data-engineering-team`;

Managing Access with GRANT Syntax


Unity Catalog uses standard SQL GRANT/REVOKE statements for access control. Permissions follow a hierarchy of securable objects:


Metastore → Catalog → Schema → Table/View/Function


SQL - Access Control Examples
-- Grant full access to a catalog
GRANT ALL PRIVILEGES ON CATALOG production TO `data-engineering-team`;

-- Grant read-only access to a specific schema
GRANT USE SCHEMA ON SCHEMA production.sales TO `analytics-team`;
GRANT SELECT ON SCHEMA production.sales TO `analytics-team`;

-- Grant access to a single table
GRANT SELECT ON TABLE production.sales.orders TO `reporting-user`;

-- Revoke access
REVOKE SELECT ON TABLE production.sales.orders FROM `reporting-user`;

Key permissions include USE CATALOG, USE SCHEMA, SELECT, MODIFY, CREATE TABLE, and ALL PRIVILEGES. The USE permissions are required for a user to even see the catalog or schema in their workspace - without them, the object is invisible.


Data Lineage Tracking


Unity Catalog automatically captures column-level lineage across notebooks, jobs, and Delta Live Tables (DLT) pipelines. No manual configuration required. When a notebook reads from production.sales.orders and writes to production.analytics.daily_revenue, Unity Catalog records that relationship automatically.


You can view lineage in the Databricks UI by navigating to any table and clicking the "Lineage" tab. It shows both upstream sources (where the data comes from) and downstream consumers (what depends on this table). This is invaluable during incident response: if a source table has bad data, you can immediately see every downstream table and dashboard affected.


Data Discovery: Search and Tagging


Unity Catalog includes a built-in search interface that indexes table names, column names, descriptions, and tags across all catalogs. Instead of asking "does anyone know where the customer churn data lives?", users can search for "churn" and find every table and column that matches.


Tags add another layer of discoverability. You can tag tables and columns with labels like pii, gdpr, finance, or deprecated. Tags are also useful for policy enforcement - you can write automation that checks whether any table tagged pii is accessible by groups that shouldn't have PII access.


SQL - Tagging
-- Tag a table
ALTER TABLE production.sales.customers
  SET TAGS ('pii', 'gdpr-relevant');

-- Tag a specific column
ALTER TABLE production.sales.customers
  ALTER COLUMN email SET TAGS ('pii-email');

Audit Logging


Every access event in Unity Catalog gets logged: who queried which table, when, from which workspace. These audit logs feed into your Databricks system tables (system.access.audit) and can be exported to your SIEM or compliance tooling. For organizations subject to SOX, HIPAA, or GDPR, this is the difference between a 3-week audit preparation cycle and a 3-hour one.


Delta Sharing Integration


Delta Sharing is an open protocol for secure data sharing that's built into Unity Catalog. It lets you share data with external organizations without copying it. Recipients don't need a Databricks account - they can read shared data from any client that supports the Delta Sharing protocol (pandas, Spark, Power BI, Tableau).


You create a share, add tables to it, and create recipients with activation links. The recipient gets read-only access to the specific tables you've shared, and you can revoke access at any time. Data never leaves your storage - recipients read directly from your cloud storage via short-lived, pre-signed URLs.


Migrating from Hive Metastore


If you have existing Hive Metastore tables, Databricks provides the SYNC command to upgrade them to Unity Catalog. The migration doesn't move data - it creates Unity Catalog entries that point to the same underlying storage.


SQL - Migrate Hive Tables
-- Sync an entire schema from Hive Metastore to Unity Catalog
SYNC SCHEMA production.sales
  FROM hive_metastore.sales_db;

-- Sync a single table
SYNC TABLE production.sales.orders
  FROM hive_metastore.sales_db.orders;

During migration, you can run both Hive Metastore and Unity Catalog in parallel. Tables are accessible through both paths, giving teams time to update their notebook references from hive_metastore.sales_db.orders to production.sales.orders. Plan for a 2–4 week migration window for most organizations.


The gotcha: ownership transfers require careful planning. In Hive Metastore, the workspace admin owns everything by default. In Unity Catalog, you need to explicitly assign owners to catalogs, schemas, and tables. If you don't plan this upfront, you'll end up with a single account admin who owns 500 tables and becomes a bottleneck for every permission request.


When Unity Catalog Matters



Gotchas and Limitations



Key Takeaways



Mohan, Senior Data Engineer

Mohan is a Senior Data Engineer at CelestInfo who evaluates and compares data platforms, tools, and architectures to help clients choose the right technology stack.

Related Articles

Q: Does Unity Catalog require Databricks Premium or Enterprise?

Yes. Unity Catalog is not available on the Databricks Standard plan. You need Premium or Enterprise tier to use Unity Catalog features including centralized governance, data lineage, and Delta Sharing.

Q: Can I migrate existing Hive Metastore tables to Unity Catalog?

Yes. Databricks provides a SYNC command that upgrades Hive Metastore tables to Unity Catalog. You can run both in parallel during the migration period, giving teams time to update references before fully decommissioning the legacy metastore.

Q: What is the 3-level namespace in Unity Catalog?

Unity Catalog uses a catalog.schema.table namespace. The catalog is the top-level container (often mapped to environments or business units), schema groups related tables, and table is the actual data object. This replaces the flat database.table structure in Hive Metastore.

Q: Does Unity Catalog support column-level lineage?

Yes. Unity Catalog automatically tracks column-level lineage across notebooks, jobs, and Delta Live Tables pipelines. You can trace any column back to its source tables without any manual configuration or additional tooling.

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.