Databricks Mosaic AI: A Practical Guide to Building AI on Your Data

Quick answer: Mosaic AI is Databricks' integrated platform for building, deploying, and governing AI applications on your own data. It includes the Agent Framework for building AI agents, Model Serving capable of handling 250,000+ queries per second, Storage-Optimized Vector Search that scales to billions of vectors at 7x lower cost, a GA AI Gateway for unified model access, and serverless GPU compute with A10g and H100 instances. Everything is governed through Unity Catalog. Databricks was named a leader in the IDC MarketScape for AI Governance 2025-2026, which tells you a lot about where the platform is heading.

The Big Picture: Why Mosaic AI Exists

Most organizations trying to build AI applications hit the same wall. They have data in one place, models in another, serving infrastructure somewhere else, and governance spread across all of it. The result is a fragmented stack where getting a model from prototype to production takes months instead of days.

Mosaic AI solves this by bringing every piece of the AI lifecycle into the Databricks Lakehouse Platform. Data preparation, model training, fine-tuning, evaluation, deployment, and monitoring all live in one environment. Your data stays in Delta Lake tables. Your models are versioned in Unity Catalog. Your agents are deployed through Model Serving. No shuffling data between systems.

For teams already running Databricks for data engineering and analytics, Mosaic AI is the natural next step. You are not adding a new platform. You are activating AI capabilities on the platform you already use.

The Mosaic AI Agent Framework

AI agents are programs that can take actions, retrieve information, and make decisions autonomously. Building them from scratch is hard. You need to handle tool calling, memory management, retrieval, error handling, and evaluation. The Agent Framework gives you the scaffolding so you can focus on the logic that matters.

Here is what you get:

Python
import mlflow
from databricks.agents import Agent, Tool

# Define a tool that queries your data
@Tool
def get_customer_info(customer_id: str) -> dict:
    """Retrieve customer details from the warehouse."""
    result = spark.sql(f"""
        SELECT name, plan, lifetime_value, last_contact
        FROM customers.gold.customer_360
        WHERE customer_id = '{customer_id}'
    """).first()
    return result.asDict()

# Create an agent with tools and instructions
agent = Agent(
    model="databricks-dbrx",
    tools=[get_customer_info],
    instructions="""You are a customer support agent.
    Use the get_customer_info tool to look up customer details.
    Be concise and helpful."""
)

# Log and deploy
with mlflow.start_run():
    mlflow.langchain.log_model(agent, "customer_agent")

The Agent Framework integrates with MLflow for experiment tracking and model versioning. Every agent version, every evaluation result, and every deployment is tracked. This matters a lot when you need to audit why an agent gave a particular response or when you need to roll back to a previous version.

Model Serving: 250K Queries Per Second

Model Serving is where your models and agents go to production. It is a fully managed serving layer that auto-scales based on traffic. The platform handles load balancing, GPU allocation, batching, and failover.

The numbers are impressive. Model Serving can handle over 250,000 queries per second with low latency. That is enough for real-time inference in production applications, customer-facing chatbots, recommendation engines, and fraud detection systems.

You can serve custom models trained on Databricks, open-source models from Hugging Face, or external models like GPT-5.2 through the Foundation Model APIs. The serving layer provides a consistent REST API regardless of what model is behind it. This means you can swap models without changing your application code.

For teams migrating from self-managed inference infrastructure, Model Serving eliminates the operational burden of managing GPU clusters, handling scaling, and dealing with model deployment pipelines. If your team is currently building data engineering pipelines on Databricks, adding AI serving is a natural extension.

Vector Search: Scaled and Cost-Effective

Vector search is essential for RAG applications, recommendation systems, and similarity matching. Mosaic AI includes a built-in Vector Search service that is tightly integrated with Delta Lake and Unity Catalog.

The Storage-Optimized Vector Search option is the highlight here. It scales to billions of vectors while costing roughly 7x less than the compute-optimized tier. How? It uses disk-based indexing that keeps only the most frequently accessed vectors in memory, with the rest on fast SSD storage. For most use cases, the latency difference is negligible, but the cost savings are significant.

Python
from databricks.vector_search.client import VectorSearchClient

# Create a vector search endpoint
client = VectorSearchClient()
client.create_endpoint(name="product_search_endpoint")

# Create a vector index from a Delta table
index = client.create_delta_sync_index(
    endpoint_name="product_search_endpoint",
    index_name="catalog.schema.product_embeddings_index",
    source_table_name="catalog.schema.products",
    primary_key="product_id",
    embedding_source_column="description",
    pipeline_type="TRIGGERED"  # or "CONTINUOUS" for real-time sync
)

# Query the index
results = index.similarity_search(
    query_text="lightweight running shoes for trail running",
    columns=["product_id", "name", "price", "description"],
    num_results=10
)

A key advantage: the vector index stays in sync with your Delta table automatically. When new rows are added or existing rows are updated, the index updates too. No separate ETL pipeline needed to keep embeddings current. You can choose triggered sync (on-demand) or continuous sync (real-time). This integration with Unity Catalog means the same access controls that govern your tables also govern your vector indexes.

AI Gateway: One Interface for All Models

The AI Gateway is now generally available. It acts as a unified proxy layer between your applications and model providers. Whether you are calling a custom model on Databricks, an open-source model, or an external provider like OpenAI, everything goes through the same API.

Why does this matter? Three reasons.

First, cost tracking. The Gateway logs every request with token counts and cost estimates. You can see exactly which team, project, or application is consuming model resources. This visibility is critical when AI usage starts growing across the organization.

Second, rate limiting and guardrails. You can set per-endpoint rate limits to prevent runaway costs. You can also configure content filters and safety guardrails that apply consistently across all model calls.

Third, provider flexibility. If you want to switch from one model provider to another, you change the Gateway configuration. Your application code stays the same. This avoids vendor lock-in at the model layer.

Serverless GPU Compute

Training and fine-tuning models requires GPUs. Historically, that meant provisioning and managing GPU clusters, which is expensive and operationally complex. Mosaic AI offers serverless GPU compute that eliminates this overhead.

You get access to NVIDIA A10g GPUs for inference and lighter workloads, and NVIDIA H100 GPUs for heavy training and fine-tuning. These are provisioned on demand and billed per second. No reserved instances. No idle GPU costs. When your job finishes, the resources are released.

This is particularly valuable for teams that need GPUs for periodic training jobs but cannot justify dedicated GPU clusters. You fine-tune a model on H100s for a few hours, pay for those hours, and move on. For organizations exploring AI and ML capabilities, serverless GPUs lower the barrier to entry dramatically.

Unity Catalog: Governance for AI

Databricks was named a leader in the IDC MarketScape for AI Governance 2025-2026, and Unity Catalog is why. It provides a single governance layer that covers your data, your models, your feature tables, and your AI endpoints.

Here is what Unity Catalog governs in the AI context:

For regulated industries, this unified governance is not optional. It is a requirement. Being able to trace a model prediction back to the training data, the feature set, and the model version is essential for compliance and audit. For a deeper look at Unity Catalog setup and best practices, see our Unity Catalog guide.

Mosaic AI vs. Building Outside Databricks

You could build an AI stack outside Databricks. Many teams do. They use SageMaker, Vertex AI, or custom infrastructure. Here is the honest comparison:

The practical guidance: if your data already lives in Databricks, Mosaic AI is the path of least resistance. The integration benefits outweigh what you get from stitching together separate tools. If you are on a different platform, evaluate whether the migration cost is worth the integration benefits. Our cloud migration strategy guide can help you think through that decision.

Getting Started: A Practical Path

Here is the sequence we recommend for teams new to Mosaic AI:

  1. Start with the AI Gateway and Foundation Model APIs. Use external models (like GPT-5.2 or Claude) through the Gateway on your existing data. This gives you AI capabilities with zero model training.
  2. Add Vector Search for RAG. Point a vector index at your knowledge base or documentation. Build a simple chatbot or search interface. This is the fastest way to demonstrate value.
  3. Build an agent with the Agent Framework. Once you have retrieval working, wrap it in an agent that can take actions (query data, call APIs, generate reports). Use Agent Bricks to optimize configuration.
  4. Fine-tune a model for your domain. When generic models are not accurate enough for your specific use case, fine-tune an open-source model on your data using serverless GPU compute.
  5. Set up governance from day one. Register everything in Unity Catalog. Set up access controls. Enable monitoring. This is much harder to retrofit than to build from the start.

Key Takeaways

CelestInfo Engineering Team

We build AI applications on Databricks for enterprises. From agent development to production deployment, we handle the engineering. Talk to us

Related Articles

Frequently Asked Questions About Mosaic AI

What teams ask us most about building AI on Databricks

Mosaic AI pricing varies by component. Model Serving is billed based on the compute used (DBUs per hour) and the model size. Vector Search is billed per storage and query volume. Serverless GPU compute (A10g and H100 instances) is billed per second of usage. The AI Gateway itself does not have a separate charge, but the underlying model calls and compute are billed. Contact Databricks for specific pricing based on your expected workload.

Databricks offers serverless GPU compute with NVIDIA A10g GPUs for inference and lighter training workloads, and NVIDIA H100 GPUs for heavy training and fine-tuning. These are provisioned on demand, so you do not need to reserve or manage GPU clusters. Availability depends on your cloud provider region (AWS, Azure, or GCP).

Mosaic AI and Snowflake Cortex AI take different approaches. Cortex AI focuses on SQL-native AI functions that run inside Snowflake, ideal for teams that want to add AI to existing SQL workflows without managing infrastructure. Mosaic AI is a more comprehensive platform for building, training, fine-tuning, and deploying custom AI models and agents. If you need custom models or agentic workflows, Mosaic AI offers more flexibility. If you want simple AI functions on warehouse data, Cortex is more straightforward.

Yes. The AI Gateway supports external model providers including OpenAI (GPT-5.2), Anthropic (Claude), and others through Foundation Model APIs. The Gateway provides a unified interface so you can switch between providers without changing your application code. It also adds rate limiting, cost tracking, and audit logging on top of external API calls.

Agent Bricks is a feature within the Mosaic AI Agent Framework that automatically optimizes agent configurations. Instead of manually tuning parameters like retrieval thresholds, chunk sizes, and model selection, Agent Bricks tests different configurations and selects the best-performing one based on evaluation metrics. This reduces the trial-and-error involved in building production-quality AI agents.

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.