Why I Switched from Orama to Pagefind
Why I use Pagefind: A deep dive into chunked indexing, bandwidth scaling, and setting up a production-grade search for Astro.
I previously wrote about building search with Orama. It’s a fantastic piece of engineering—an in-memory database running entirely in the browser.
But it has a fatal flaw for content sites: Scalability.
Orama (and Fuse.js/Lunr) are monolithic. To search anything, the user must download everything. If you have 500 posts, the user downloads 500 posts’ worth of metadata before they can type a single letter in the search bar.
Pagefind flips this model. Instead of sending the database to the user, it splits the index into thousands of tiny binary “chunks.” The browser only downloads the chunks relevant to the specific search query.
Here is how I implemented Pagefind in Astro, covering the parts the docs skip, like custom ranking, complex filtering, and fixing the “Dev Mode” headache.
Why I love the chunking approach
When you build your site, Pagefind scans your HTML and generates a static “API” in your dist/ folder.
pagefind.js(30KB): The tiny client entry point.pagefind-entry.json(4KB): A map of all words to “index chunks.”- Index Chunks (10-20KB each): Binary files containing lists of documents that match specific word hashes.
The Flow:
- User types “React”.
pagefind.jshashes “React” and looks up which chunk contains “R” words.- It fetches only
index_chunk_a1b2.pf_index. - It finds the matches and fetches the fragments for those specific URLs to display the snippet.
Total bandwidth: ~50KB. Orama bandwidth: ~600KB (for the same dataset).
1. The Setup
The npx pagefind --site dist command works in prod, but it breaks your local dev workflow because astro dev doesn’t generate a dist folder.
We need a setup that works in both environments.
Installation:
pnpm add -D pagefind
The Config:
Create a pagefind.yml in your root. This keeps your CLI commands clean.
# pagefind.yml
site: dist
exclude_selectors:
- 'nav'
- 'footer'
- '.related-posts'
glob: '**/*.html'
The “Dev Mode” Fix: Since Pagefind needs static HTML to crawl, it cannot index your site while it’s in hot-reloading dev mode. The workaround is to build the index once, and then tell Astro to serve it as a static asset.
Add this to package.json:
"scripts": {
"dev": "pnpm build:search && astro dev",
"build": "astro build && pnpm build:search",
"build:search": "pagefind --site dist"
}
Now, we need to allow Astro to serve the pagefind.js file from the dist folder during dev. Update astro.config.mjs:
import { defineConfig } from 'astro/config';
export default defineConfig({
vite: {
server: {
// Allow serving files from the dist directory during dev
fs: {
allow: ['dist'],
},
},
},
});
Note: In dev, the search results will be “stale” (from the last build). This is the trade-off.
2. The Schema: Decorating HTML
Pagefind doesn’t use a JSON schema. It uses your DOM. You control the index using Data Attributes.
The Basics
Wrap your main content in a data-pagefind-body attribute and add metadata attributes to the elements you want to index:
<article data-pagefind-body>
<!-- Only index the title and content, ignore the rest -->
<h1 data-pagefind-meta="title">{title}</h1>
<!-- Index the image URL as metadata for the UI -->
<img src="{heroImage}" data-pagefind-meta="image[src]" />
<div class="content">
<slot />
</div>
</article>
Advanced: Weighting & Filtering
This is where Pagefind becomes powerful. You can boost specific terms and create facets without writing config files.
<!-- BOOSTING: Words in the intro are 2x more important -->
<p class="intro" data-pagefind-weight="2.0">{description}</p>
<!-- FILTERING: Create a 'Type' facet for the UI -->
<span data-pagefind-filter="type:blog" class="hidden"></span>
<!-- FILTERING: Create a 'Tag' facet for every tag -->
{tags.map(tag => (
<span data-pagefind-filter="{`tag:${tag}`}" class="hidden"></span>
))}
Now, searching for “Astro” in the intro paragraph ranks higher than “Astro” in the footer. To learn more, checkout the weighting and metadata docs.
3. The Search UI
Since pagefind.js is a generated asset, we can’t import it normally. We must use a dynamic import pointing to the URL.
import { useState, useEffect } from 'preact/hooks';
export default function SearchModal() {
const [pagefind, setPagefind] = useState(null);
const [results, setResults] = useState([]);
// Lazy load on open
useEffect(() => {
async function load() {
// 1. Dynamic import from the public URL
// @vite-ignore keeps Vite from trying to bundle this
const lib = await import(/* @vite-ignore */ '/pagefind/pagefind.js');
await lib.init();
setPagefind(lib);
}
load();
}, []);
const handleSearch = async query => {
if (!pagefind || !query) return;
// 2. Run Search
const search = await pagefind.search(query);
// 3. Load data for top 5 results
// Pagefind returns "pointers" first, then we fetch the data
const topResults = await Promise.all(
search.results.slice(0, 5).map(r => r.data())
);
setResults(topResults);
};
// ... Render UI
}
Where Pagefind Shines (and Fails)
Where it is Extremely Effective
-
Long-Form Documentation: Pagefind supports Sub-Result Anchors. If you search “Permissions”, and you have a massive guide called “Linux Basics”, Pagefind can return a direct link to the
## PermissionsH2 inside that page (/linux-guide#permissions). Orama struggles with this without massive index bloat.Enable it with
await pagefind.init({ showSubResults: true }). -
Multi-Site Search: If you have a Blog (Astro) and a Documentation site (Starlight), Pagefind can merge indexes. You can run one search bar that queries both
blog.com/pagefindanddocs.com/pagefindseamlessly in the browser.
Where it Fails
-
Typo Tolerance: This is the biggest trade-off. Orama uses Levenshtein distance to match, for example, “astor” to “astro.” Pagefind does not.
Pagefind is a stemming engine. It knows “run” matches “running.” But if you type “rnning”, it finds nothing. It relies on prefix matching, so typing “prog” finds “programming”, but “pgramming” finds zero.
-
Dev Experience: There is no way around it: having to run a build to update the search index is annoying. If you write a new post, you won’t see it in search until you restart the server with a fresh build.
Conclusion
Pagefind is the “Adult” choice for static search. It isn’t as flashy as Orama, and it doesn’t do vector embeddings or any other AI magic.
But it scales from 10 pages to 10,000 pages without changing a line of code.
What’s next?
This is part of a series of posts on implementing search for static sites:
- The Right Way to Add Orama Search to Astro — simple, zero-config search for small to medium sites
- Why I Switched from Orama to Pagefind (you are here) — chunked index for better scalability
- Meilisearch is the Best Search You’ll Never Need — server-side search with advanced features
- Why I Didn’t Use Google Programmable Search (coming soon) — the hidden costs and indexing delays that make it impractical
- I Tried 4 Search Engines So You Don’t Have To (coming soon) — comprehensive comparison from a small blog perspective
All with practical examples from a real production blog.