Engineering Deep Dive

How CineMatch Builds Recommendations

A two-stage pipeline that combines vector similarity search with a learned ranking model to surface movies you will actually want to watch.

How recommendations work

The two-stage pipeline

Every recommendation request flows through two stages. First, we cast a wide net using vector search to find movies that are semantically close to the user's taste. Then, a scoring model re-ranks those candidates using richer signals to surface the best results.

User Profile

Embedding

Interaction history encoded as a 1536-dim vector

Retrieval

pgvector kNN

Cosine similarity search finds the 50 closest movies

50 candidates

Ranking

ML Scoring

Multi-feature model re-scores and sorts the candidates

Top 20

Results

Personalized

Your top recommendations, ordered by predicted relevance

The retrieval stage

Vector search with pgvector

Every movie is converted into a 1536-dimensional embedding using OpenAI's text-embedding-3-small model. The input combines the movie's plot summary, genres, release year, and key metadata into a single dense vector that captures its semantic identity.

User preferences are encoded the same way, built from the embeddings of movies they have liked and watched, weighted by recency.

Finding candidates is a nearest-neighbor search: we use pgvector's HNSW index to find the 50 movies with the highest cosine similarity to the user's embedding. This runs in under 50ms, even across the full catalog.

Embedding Space

Dimensions1,536

Distance metricCosine similarity

Index typeHNSW (m=16, ef=64)

Candidates returned50

Try it yourself

Pick any movie below to see its 5 nearest neighbors in embedding space. This calls the real pgvector index with live data.

Choose a movie to find its nearest neighbors

Select a movie above to see real-time vector similarity search in action

The ranking stage

Multi-signal re-ranking

Raw similarity is not enough. A movie can be close in embedding space but poorly rated, or popular but not to the user's taste. The ranking stage combines multiple signals into a single score that balances relevance, quality, and diversity.

Scoring Weights

Cosine Similarity50%

How close the movie is to the user's taste in embedding space

Vote Quality25%

TMDB community rating, normalized to a 0-1 scale

Log Popularity15%

Logarithmic popularity prevents blockbusters from drowning everything

Genre Overlap10%

Fraction of the movie's genres matching the user's preferences

Upgrade path: When sufficient real interaction data accumulates, the linear scorer is replaced by a LambdaMART model (LightGBM) that directly optimizes NDCG. The model learns non-linear feature interactions that handcrafted weights cannot capture, such as the relationship between genre preferences and popularity thresholds. The ranker service supports both models and routes between them per-request.

Evaluation

Measuring recommendation quality

NDCG@10

Measures whether the most relevant movies appear at the top of the list, penalizing good recommendations buried at position 8 more than position 2.

MRR

How quickly a user finds something they want. It measures the average rank of the first relevant result across all users.

Hit Rate@10

The simplest test: does the top-10 list contain at least one movie the user would actually enjoy?

Model	NDCG@10	MRR	Hit Rate
Popularity Baseline	0.62	0.71	0.85
Vector Retrieval Only	0.76	0.89	0.95
Two-Stage PipelineCurrent

Cold start

What happens for new users

A new user has no interaction history, which means no user embedding and no signal for the ranking model. Rather than showing nothing, the pipeline falls back gracefully through three tiers:

0 interactions

Popularity Fallback

The system returns the most popular, highest-rated movies across all genres. No personalization, but the recommendations are still high quality.

100% popular

1-5 interactions

Content-Based Filtering

After a few likes or watches, the system builds a preliminary user embedding from the movies' own embeddings. Cosine similarity retrieval begins, blended with popular results.

60% popular, 40% personalized

6+ interactions

Full Pipeline

With enough signal, the two-stage pipeline activates fully. The user embedding stabilizes, and the ranking model has enough context to re-score candidates meaningfully.

100% personalized

Tech stack

Built with

API Backend

Fast compilation, small binaries, and a concurrency model that handles high-throughput ranking calls without framework overhead.

Python FastAPI

Ranking Service

The ML ecosystem lives in Python. FastAPI gives type-safe endpoints with Pydantic validation and sub-millisecond overhead.

Supabase + pgvector

Database & Vector Search

Postgres with pgvector replaces separate Elasticsearch and Redis instances. HNSW indexes give sub-50ms kNN queries at this scale.