Media & Entertainment 280 Employees 14 Weeks

How a Streaming Service Built a Real-Time Content Recommendation Pipeline on Databricks

Batch recommendations that updated every 6 hours, ML models deployed via manual Docker pushes, and a 3-day model retraining cycle. Here's how we built a real-time feature engineering pipeline that serves 50M+ predictions per day.

The Challenge

This streaming platform has about 4.2 million active subscribers watching a mix of licensed and original content across movies, series, and documentaries. Their recommendation engine was built 3 years ago as a batch system: every 6 hours, a set of Spark jobs would crunch user viewing history, generate recommendations, and write them to a Redis cache that the app consumed.

The 6-hour lag was the core problem. When a user binged a new genre on Saturday morning - say they watched 3 true crime documentaries in a row - the recommendations didn't reflect that interest until Saturday afternoon at the earliest. By then, the user had either found something on their own or, more likely, closed the app. User engagement data (plays, pauses, skips, completion rates) lived in Kafka topics, content metadata sat in PostgreSQL, and the ML models were trained in isolated Jupyter notebooks that two data scientists maintained.

The deployment process was painfully manual. When a data scientist wanted to push an updated model to production, they'd export the model artifact, wrap it in a Docker container, push it to ECR, and update a Kubernetes deployment. This took about half a day per deployment, and they could only test model changes in production with a rudimentary A/B split they'd hardcoded into the application layer. Model retraining took 3 days because the feature engineering pipeline - computing genre preferences, watch-time distributions, skip patterns, and content similarity scores - ran as a separate batch job that nobody had optimized since it was first written.

What We Built

We built a real-time feature engineering pipeline on Databricks using Structured Streaming. The pipeline consumes user interaction events directly from Kafka - every play, pause, skip, completion, and search event - and transforms them into real-time features within minutes of the action occurring. The key features we compute in near-real-time include:

  • Recency-weighted genre preferences: A rolling 7-day window where recent watches are weighted 3x compared to watches from 5+ days ago. This is what makes the Saturday morning binge immediately shift recommendations.
  • Watch-time distributions: What times of day and days of week each user typically watches, broken down by content type. A user who watches comedies on weeknight evenings but documentaries on weekend mornings gets different recommendations based on when they open the app.
  • Skip patterns: If a user consistently skips past the first 10 minutes of a show, that's a negative signal that's as valuable as a completion is positive.

These features land in a Delta Lake feature store that serves both real-time inference and model training. Content metadata syncs from PostgreSQL via Debezium CDC into Kafka, then into Delta Lake - so when new content is added or metadata changes (like a show getting reclassified), the feature store reflects it within minutes instead of waiting for the next batch run.

The ML models - a hybrid of collaborative filtering and content-based approaches - retrain daily on Databricks using MLflow for experiment tracking and model registry. We replaced the manual Docker deployment process with Databricks Model Serving, which handles the containerization and scaling automatically. Inference latency is under 100ms at the p99. A/B testing is managed through MLflow's model versioning: traffic splits between model versions are configured in MLflow, not hardcoded in application code.

Results

6 hrs → <5 min Recommendation latency
23% Increase in average session time
3 days → 4 hrs Model retraining (automated)
50M+/day Predictions served from feature store

Tech Stack

Databricks (Structured Streaming) MLflow Databricks Model Serving Kafka Delta Lake PostgreSQL Debezium

What We Learned

  • Recency weighting matters more than model complexity. Our first model iteration used a sophisticated deep learning architecture, and it performed about the same as the simpler hybrid model. What actually moved engagement metrics was the recency-weighted features - getting fresh signals into the model within minutes instead of hours. The simpler model with real-time features beat the complex model with 6-hour-old features every time.
  • Debezium CDC is the right way to sync relational data into a streaming pipeline. The team's previous approach was a nightly pg_dump that they'd load into the feature pipeline. Debezium captures every INSERT, UPDATE, and DELETE from PostgreSQL as a Kafka event, which means the feature store always has current content metadata. The setup took about 2 days, including schema configuration for 12 PostgreSQL tables.
  • MLflow Model Serving eliminated a full-day deployment cycle. The old process - export model, build Docker image, push to ECR, update K8s deployment - took a data scientist about 4-5 hours and was error-prone. With MLflow, they register a model version, promote it to "Production" stage, and Model Serving picks it up automatically. Deployment went from half a day to 15 minutes.
  • Skip patterns are an underrated signal. Most recommendation systems focus on positive signals (watches, completions, ratings). We found that skip patterns - consistently abandoning a certain type of content within the first few minutes - were one of the strongest negative signals in our feature set. Adding skip features improved recommendation relevance by about 11% in A/B testing.

Recommendations Running on Stale Data?

Batch updates, manual model deployments, engagement data that's hours old by the time it reaches your models - we can fix that. Let's talk about your pipeline.

Start a Conversation