How I built a semantic recommendation engine

When you finish reading an article, the most important thing for engagement is what you read next.

The way I used to handle “Related Posts” was: match a few tags, maybe check the category, and hope for the best. The problem is that tags are manual, prone to human error, and don’t actually capture the meaning of the content.

I wanted a system that actually understands context. If I write about “optimizing React performance,” it should link to “Astro bundle sizes,” even if they don’t share the exact same tag.

So I built a semantic recommendation engine.

But here’s the constraint: I didn’t want to spin up a vector database. I wanted a script that runs at build time, costs nearly nothing, and lives right in the repo.

Here is how I built it using the Vercel AI SDK, Qwen3 embeddings, and some clever caching.

The Strategy: Hybrid Scoring

Pure vector search (semantic similarity) is great, but it has blind spots. Sometimes, distinct topics connect semantically in ways that aren’t actually helpful for a reader.

To get the best results, I implemented a hybrid scoring system. It’s not enough for two articles to sound similar; we need structural relevance too.

The formula looks roughly like this:

Semantic Similarity (85%): Using cosine similarity on vector embeddings.
Tag Overlap (10%): A Jaccard index to measure shared tags.
Category Bonus (5%): A small bump if they belong to the same section (e.g., Blog vs. Works).

This weighting ensures that while content meaning drives the recommendation, we prefer items that are structurally related.

FYI: Cosine similarity compares two vector embeddings by looking at the angle between them. If they point in the same direction, they mean something similar. And Jaccard index is a simple way to measure how similar two groups are by checking how much they overlap.

Generating Embeddings on a Budget

For the embeddings, I used the Qwen3 Embedding 8B model via OpenRouter. It’s incredibly cheap, holds a high rank in the MTEB leaderboard, and performs well enough for content recommendation.

The logic is simple:

Flatten the markdown content (strip syntax).
Chunk it into segments (~1200 characters).
Generate embeddings for each chunk.
Mean-pool them into a single vector for the document.

Here is the core logical flow effectively using embedMany from the Vercel AI SDK to handle batching:

// Breaking down the document into chunks
const chunks = chunkText(doc.textForEmbedding);

// Batch embedding generation
const { embeddings } = await embedMany({
  model: openai.embeddingModel('qwen/qwen3-embedding-8b'),
  values: chunks,
  maxParallelCalls: 2,
  maxRetries: 2,
});

// Averaging chunks into one document vector
const pooled = normalize(meanPool(embeddings.map(normalize)));

The Caching Layer

The fastest way to burn money on AI APIs is to regenerate embeddings for content that hasn’t changed.

Since this script runs every time I deploy or build the site, I needed a way to skip 99% of the work.

I implemented a content-addressable cache. Before processing a file, I generate a SHA256 hash of its content and metadata. The hash is then verified against a local .cache/embeddings-cache.json file.

const contentHash = sha256(textForEmbedding);
const cached = cache[doc.key];

if (cached && cached.hash === contentHash) {
  // Content hasn't changed, reuse the old vector
  docEmbeddings[doc.key] = cached.embedding;
  continue;
}

// ... otherwise, call the API and pay the fraction of a cent

This makes subsequent builds nearly instant. I only pay for the API when I actually write a new post or edit an existing one.

The Scoring Algorithm

Once we have vectors for every document, we need to compare them.

I used a simple O(n²) loop to compare every document against every other document. For a blog with hundreds or even a few thousand posts, this is negligible (milliseconds). If I ever hit 100,000 posts, I’ll reconsider a real database.

This is where that hybrid logic comes in:

// Inside the comparison loop
const semantic = cosineSimilarity(aVec, bVec);
const tagSim = jaccard(a.meta.tags, b.meta.tags);
const sameCategory = a.category === b.category ? 1 : 0;

// The weighting magic
const score = 0.85 * semantic + 0.1 * tagSim + 0.05 * sameCategory;

The result is a simple recommendations.json file generated at build time.

{
  "blog/building-animated-sprite-hero": [
    {
      "key": "blog/building-perfect-toc-component",
      "score": 0.847,
      "reason": "sem=0.842 tag=0.33 cat=1"
    }
    // ...
  ]
}

Integrating into Astro

This step is trivial. I added the script to my package.json to ensure it runs before the main build.

"scripts": {
  "recommendations": "tsx scripts/build-recommendations.ts",
  "build": "pnpm run recommendations && astro check && astro build"
}

Inside the actual blog post component, I import the JSON and render the cards. No client-side fetching, no API latency, no database connection strings. Just static JSON.

Why this works better

By combining semantic search with the structure I already had (tags and categories), the recommendations feel strictly better.

If I write a technical deep dive, the system focuses on the tech stack (semantic). If I write a life update, it focuses on tags (journaling, life).

It’s “smart enough” automation that stays out of the way. And since it uses Qwen via OpenRouter with extensive caching, the total cost to run this for the entire year will likely be less than a cup of coffee.

Not bad for a JSON file.

How I built a semantic recommendation engine

The Strategy: Hybrid Scoring

Generating Embeddings on a Budget

The Caching Layer

The Scoring Algorithm

Integrating into Astro

Why this works better

Subscribe to my newsletter

Related Posts.

Automating Accessibility: Generating Alt Text with AI

The Right Way to Add Orama Search to Astro

Why I Didn't Use Google Programmable Search