Engineering Deep Dive
How CineMatch Builds Recommendations
A two-stage pipeline that combines vector similarity search with a learned ranking model to surface movies you will actually want to watch.
01
How recommendations work
The two-stage pipeline
Every recommendation request flows through two stages. First, we cast a wide net using vector search to find movies that are semantically close to the user's taste. Then, a scoring model re-ranks those candidates using richer signals to surface the best results.
02
The retrieval stage
Vector search with pgvector
Every movie is converted into a 1536-dimensional embedding using OpenAI's text-embedding-3-small model. The input combines the movie's plot summary, genres, release year, and key metadata into a single dense vector that captures its semantic identity.
User preferences are encoded the same way, built from the embeddings of movies they have liked and watched, weighted by recency.
Finding candidates is a nearest-neighbor search: we use pgvector's HNSW index to find the 50 movies with the highest cosine similarity to the user's embedding. This runs in under 50ms, even across the full catalog.
Embedding Space
Try it yourself
Pick any movie below to see its 5 nearest neighbors in embedding space. This calls the real pgvector index with live data.
Select a movie above to see real-time vector similarity search in action
03
The ranking stage
Multi-signal re-ranking
Raw similarity is not enough. A movie can be close in embedding space but poorly rated, or popular but not to the user's taste. The ranking stage combines multiple signals into a single score that balances relevance, quality, and diversity.
Scoring Weights
How close the movie is to the user's taste in embedding space
TMDB community rating, normalized to a 0-1 scale
Logarithmic popularity prevents blockbusters from drowning everything
Fraction of the movie's genres matching the user's preferences
Upgrade path: When sufficient real interaction data accumulates, the linear scorer is replaced by a LambdaMART model (LightGBM) that directly optimizes NDCG. The model learns non-linear feature interactions that handcrafted weights cannot capture, such as the relationship between genre preferences and popularity thresholds. The ranker service supports both models and routes between them per-request.
04
Evaluation
Measuring recommendation quality
NDCG@10
Measures whether the most relevant movies appear at the top of the list, penalizing good recommendations buried at position 8 more than position 2.
MRR
How quickly a user finds something they want. It measures the average rank of the first relevant result across all users.
Hit Rate@10
The simplest test: does the top-10 list contain at least one movie the user would actually enjoy?
| Model | NDCG@10 | MRR | Hit Rate |
|---|---|---|---|
| Popularity Baseline | 0.62 | 0.71 | 0.85 |
| Vector Retrieval Only | 0.76 | 0.89 | 0.95 |
| Two-Stage PipelineCurrent |
05
Cold start
What happens for new users
A new user has no interaction history, which means no user embedding and no signal for the ranking model. Rather than showing nothing, the pipeline falls back gracefully through three tiers:
0 interactions
Popularity Fallback
The system returns the most popular, highest-rated movies across all genres. No personalization, but the recommendations are still high quality.
100% popular
1-5 interactions
Content-Based Filtering
After a few likes or watches, the system builds a preliminary user embedding from the movies' own embeddings. Cosine similarity retrieval begins, blended with popular results.
60% popular, 40% personalized
6+ interactions
Full Pipeline
With enough signal, the two-stage pipeline activates fully. The user embedding stabilizes, and the ranking model has enough context to re-score candidates meaningfully.
100% personalized
06
Tech stack
Built with
Go
API Backend
Fast compilation, small binaries, and a concurrency model that handles high-throughput ranking calls without framework overhead.
Python FastAPI
Ranking Service
The ML ecosystem lives in Python. FastAPI gives type-safe endpoints with Pydantic validation and sub-millisecond overhead.
Supabase + pgvector
Database & Vector Search
Postgres with pgvector replaces separate Elasticsearch and Redis instances. HNSW indexes give sub-50ms kNN queries at this scale.