Databricks Mosaic AI: A Practical Guide to Building AI on Your Data
Quick answer: Mosaic AI is Databricks' integrated platform for building, deploying, and governing AI applications on your own data. It includes the Agent Framework for building AI agents, Model Serving capable of handling 250,000+ queries per second, Storage-Optimized Vector Search that scales to billions of vectors at 7x lower cost, a GA AI Gateway for unified model access, and serverless GPU compute with A10g and H100 instances. Everything is governed through Unity Catalog. Databricks was named a leader in the IDC MarketScape for AI Governance 2025-2026, which tells you a lot about where the platform is heading.
The Big Picture: Why Mosaic AI Exists
Most organizations trying to build AI applications hit the same wall. They have data in one place, models in another, serving infrastructure somewhere else, and governance spread across all of it. The result is a fragmented stack where getting a model from prototype to production takes months instead of days.
Mosaic AI solves this by bringing every piece of the AI lifecycle into the Databricks Lakehouse Platform. Data preparation, model training, fine-tuning, evaluation, deployment, and monitoring all live in one environment. Your data stays in Delta Lake tables. Your models are versioned in Unity Catalog. Your agents are deployed through Model Serving. No shuffling data between systems.
For teams already running Databricks for data engineering and analytics, Mosaic AI is the natural next step. You are not adding a new platform. You are activating AI capabilities on the platform you already use.
The Mosaic AI Agent Framework
AI agents are programs that can take actions, retrieve information, and make decisions autonomously. Building them from scratch is hard. You need to handle tool calling, memory management, retrieval, error handling, and evaluation. The Agent Framework gives you the scaffolding so you can focus on the logic that matters.
Here is what you get:
- Agent authoring tools that let you define agent behavior, available tools, and retrieval sources in Python
- Agent evaluation that automatically tests your agent against a set of scenarios and measures quality metrics like answer correctness, relevance, and groundedness
- Agent deployment that packages your agent as a REST endpoint through Model Serving with built-in monitoring
- Agent Bricks for auto-optimized agents. Instead of manually tuning retrieval thresholds and chunk sizes, Agent Bricks tests different configurations and picks the best one
import mlflow
from databricks.agents import Agent, Tool
# Define a tool that queries your data
@Tool
def get_customer_info(customer_id: str) -> dict:
"""Retrieve customer details from the warehouse."""
result = spark.sql(f"""
SELECT name, plan, lifetime_value, last_contact
FROM customers.gold.customer_360
WHERE customer_id = '{customer_id}'
""").first()
return result.asDict()
# Create an agent with tools and instructions
agent = Agent(
model="databricks-dbrx",
tools=[get_customer_info],
instructions="""You are a customer support agent.
Use the get_customer_info tool to look up customer details.
Be concise and helpful."""
)
# Log and deploy
with mlflow.start_run():
mlflow.langchain.log_model(agent, "customer_agent")
The Agent Framework integrates with MLflow for experiment tracking and model versioning. Every agent version, every evaluation result, and every deployment is tracked. This matters a lot when you need to audit why an agent gave a particular response or when you need to roll back to a previous version.
Model Serving: 250K Queries Per Second
Model Serving is where your models and agents go to production. It is a fully managed serving layer that auto-scales based on traffic. The platform handles load balancing, GPU allocation, batching, and failover.
The numbers are impressive. Model Serving can handle over 250,000 queries per second with low latency. That is enough for real-time inference in production applications, customer-facing chatbots, recommendation engines, and fraud detection systems.
You can serve custom models trained on Databricks, open-source models from Hugging Face, or external models like GPT-5.2 through the Foundation Model APIs. The serving layer provides a consistent REST API regardless of what model is behind it. This means you can swap models without changing your application code.
For teams migrating from self-managed inference infrastructure, Model Serving eliminates the operational burden of managing GPU clusters, handling scaling, and dealing with model deployment pipelines. If your team is currently building data engineering pipelines on Databricks, adding AI serving is a natural extension.
Vector Search: Scaled and Cost-Effective
Vector search is essential for RAG applications, recommendation systems, and similarity matching. Mosaic AI includes a built-in Vector Search service that is tightly integrated with Delta Lake and Unity Catalog.
The Storage-Optimized Vector Search option is the highlight here. It scales to billions of vectors while costing roughly 7x less than the compute-optimized tier. How? It uses disk-based indexing that keeps only the most frequently accessed vectors in memory, with the rest on fast SSD storage. For most use cases, the latency difference is negligible, but the cost savings are significant.
from databricks.vector_search.client import VectorSearchClient
# Create a vector search endpoint
client = VectorSearchClient()
client.create_endpoint(name="product_search_endpoint")
# Create a vector index from a Delta table
index = client.create_delta_sync_index(
endpoint_name="product_search_endpoint",
index_name="catalog.schema.product_embeddings_index",
source_table_name="catalog.schema.products",
primary_key="product_id",
embedding_source_column="description",
pipeline_type="TRIGGERED" # or "CONTINUOUS" for real-time sync
)
# Query the index
results = index.similarity_search(
query_text="lightweight running shoes for trail running",
columns=["product_id", "name", "price", "description"],
num_results=10
)
A key advantage: the vector index stays in sync with your Delta table automatically. When new rows are added or existing rows are updated, the index updates too. No separate ETL pipeline needed to keep embeddings current. You can choose triggered sync (on-demand) or continuous sync (real-time). This integration with Unity Catalog means the same access controls that govern your tables also govern your vector indexes.
AI Gateway: One Interface for All Models
The AI Gateway is now generally available. It acts as a unified proxy layer between your applications and model providers. Whether you are calling a custom model on Databricks, an open-source model, or an external provider like OpenAI, everything goes through the same API.
Why does this matter? Three reasons.
First, cost tracking. The Gateway logs every request with token counts and cost estimates. You can see exactly which team, project, or application is consuming model resources. This visibility is critical when AI usage starts growing across the organization.
Second, rate limiting and guardrails. You can set per-endpoint rate limits to prevent runaway costs. You can also configure content filters and safety guardrails that apply consistently across all model calls.
Third, provider flexibility. If you want to switch from one model provider to another, you change the Gateway configuration. Your application code stays the same. This avoids vendor lock-in at the model layer.
Serverless GPU Compute
Training and fine-tuning models requires GPUs. Historically, that meant provisioning and managing GPU clusters, which is expensive and operationally complex. Mosaic AI offers serverless GPU compute that eliminates this overhead.
You get access to NVIDIA A10g GPUs for inference and lighter workloads, and NVIDIA H100 GPUs for heavy training and fine-tuning. These are provisioned on demand and billed per second. No reserved instances. No idle GPU costs. When your job finishes, the resources are released.
This is particularly valuable for teams that need GPUs for periodic training jobs but cannot justify dedicated GPU clusters. You fine-tune a model on H100s for a few hours, pay for those hours, and move on. For organizations exploring AI and ML capabilities, serverless GPUs lower the barrier to entry dramatically.
Unity Catalog: Governance for AI
Databricks was named a leader in the IDC MarketScape for AI Governance 2025-2026, and Unity Catalog is why. It provides a single governance layer that covers your data, your models, your feature tables, and your AI endpoints.
Here is what Unity Catalog governs in the AI context:
- Model registry: Every model version is tracked with metadata, parameters, metrics, and lineage. You can see which data was used to train a model and which endpoints are serving it.
- Feature tables: Features used for model training and inference are versioned and access-controlled. This prevents data leakage and ensures reproducibility.
- Vector search indexes: Access controls on the underlying data carry through to the vector index. If a user cannot see a row in the source table, they cannot retrieve it through vector search.
- Model endpoints: You control who can deploy, query, and manage serving endpoints through standard Unity Catalog permissions.
For regulated industries, this unified governance is not optional. It is a requirement. Being able to trace a model prediction back to the training data, the feature set, and the model version is essential for compliance and audit. For a deeper look at Unity Catalog setup and best practices, see our Unity Catalog guide.
Mosaic AI vs. Building Outside Databricks
You could build an AI stack outside Databricks. Many teams do. They use SageMaker, Vertex AI, or custom infrastructure. Here is the honest comparison:
- Mosaic AI advantage: Everything is integrated. Data, models, serving, governance, and monitoring live in one platform. No glue code between systems. Faster time to production.
- Mosaic AI advantage: Delta Lake and Unity Catalog integration means your AI features and training data share the same governance as your analytical tables.
- External platform advantage: More flexibility in choosing individual components. If you want a specific serving framework or training library, you have full control.
- External platform advantage: If your data is not on Databricks, building AI separately avoids adding another major platform.
The practical guidance: if your data already lives in Databricks, Mosaic AI is the path of least resistance. The integration benefits outweigh what you get from stitching together separate tools. If you are on a different platform, evaluate whether the migration cost is worth the integration benefits. Our cloud migration strategy guide can help you think through that decision.
Getting Started: A Practical Path
Here is the sequence we recommend for teams new to Mosaic AI:
- Start with the AI Gateway and Foundation Model APIs. Use external models (like GPT-5.2 or Claude) through the Gateway on your existing data. This gives you AI capabilities with zero model training.
- Add Vector Search for RAG. Point a vector index at your knowledge base or documentation. Build a simple chatbot or search interface. This is the fastest way to demonstrate value.
- Build an agent with the Agent Framework. Once you have retrieval working, wrap it in an agent that can take actions (query data, call APIs, generate reports). Use Agent Bricks to optimize configuration.
- Fine-tune a model for your domain. When generic models are not accurate enough for your specific use case, fine-tune an open-source model on your data using serverless GPU compute.
- Set up governance from day one. Register everything in Unity Catalog. Set up access controls. Enable monitoring. This is much harder to retrofit than to build from the start.
Key Takeaways
- Mosaic AI integrates the full AI lifecycle into the Databricks Lakehouse Platform, from data preparation to production deployment
- The Agent Framework with Agent Bricks makes building AI agents faster by automating configuration optimization
- Model Serving handles 250K+ queries per second with automatic scaling and GPU management
- Storage-Optimized Vector Search scales to billions of vectors at 7x lower cost than compute-optimized alternatives
- The AI Gateway (GA) provides a unified interface for all model providers with cost tracking, rate limiting, and guardrails
- Serverless GPU compute (A10g and H100) eliminates the overhead of managing GPU infrastructure
- Unity Catalog provides unified governance across data, models, features, and AI endpoints
CelestInfo Engineering Team
We build AI applications on Databricks for enterprises. From agent development to production deployment, we handle the engineering. Talk to us
