How I'd Actually Build Government Software That Doesn't Fall Over

My first self-hosted setup was a Kubernetes cluster I did not need.

K3s, then a multi-node Docker Swarm, then NFS, then a week debugging networking I did not understand, all for a handful of services that would have been perfectly happy on one box. I tore the whole thing down and rebuilt it on a single boring node. It has not fallen over since.

That is a lesson I keep relearning: most systems do not fail because they were too simple. They fail because someone got clever. Public software fails the same way, except the blast radius is a few million students instead of my home lab.

Last time I argued that CBSE’s portal didn’t need an attacker to fall over: it was built that way. So here is the obvious next question: what would you build instead?

Not a heroic answer. Heroics are what you get when experts parachute in after the breach. I want the boring answer. The fix for a CBSE-class meltdown is architectural, not heroic: a set of defaults that make the breach, the leak, and the meltdown hard to commit in the first place.

Four of them. Here they are.

Key Takeaways

The fix for a CBSE-class meltdown is architectural, not heroic: four defaults that make the breach, the leak, and the overload structurally hard to commit, instead of experts cleaning up after each one.

Federate instead of building one national monolith. India already has the blueprint in DIGIT: a hardened shared core with isolated cells, so one compromise or one traffic surge can’t spill into the next agency.

Make server-rendered HTML the canonical layer, and remove the public-storage toggle entirely. Since April 2023, every new S3 bucket blocks public access by default; the platform should make the insecure state impossible, not just discouraged.

Pave a golden path so the secure deploy is the lazy deploy, then mind the catch: without sustained funding and central stewardship, a shared platform rots into a more elegant single point of failure.

1. Federated, not monolithic

The first instinct is usually wrong: build one giant government platform and move everything onto it. One login, one stack, one place to maintain. It sounds like consolidation. It is a single point of failure with national reach. One compromise spills everywhere. One traffic surge degrades everyone. That is the monolith from the last post, just bigger.

The fix is not less sharing. It is sharing the right layer. One hardened core, one security baseline, one review process, but many isolated cells, so a breach or an overload in one agency cannot cross into the next.

India already has a model for this. DIGIT, the platform from eGov Foundation, is built exactly this way: open source, data in shared registries, a federated architecture, role-based access, PII encrypted by default. The blueprint exists. It is just not the default.

Picture three zones. A small trusted core that changes rarely and gets reviewed deeply: identity, payments, records, notifications, audit. A configurable middle where agency teams assemble forms, workflows, and dashboards from vetted parts. And a sandboxed edge for the genuinely custom stuff, where extensions are signed, least-privilege, and reviewed before they ever run. The closer you get to the core, the harder it is to touch. That is the whole point.

One giant monolith

One core, many cells

A breach

One compromise exposes everything

Contained to a single cell

A result-day surge

Couples to payments, notifications, everything

Hits one tenant; the rest stay up

Identity, payments, audit

Reinvented in every project

Solved once in the trusted core

Blast radius

National

One tenant

Custom needs

Bolted into the shared codebase

Sandboxed, signed, least-privilege at the edge

2. HTML is the canonical layer, not an afterthought

Here is the take that will annoy half the people reading this: a React-first single-page app is the wrong default for a government portal.

Not because React is bad. Because of what the job actually is. A public portal has to work for a decade, on every device a citizen owns, including the cheap Android phone on a weak signal, maintained by a rotating cast of contractors who did not write the original code. An SPA fights all three. It pushes work the browser does for free, focus management, page titles, history, scroll position, into JavaScript that someone has to reimplement correctly and usually does not. It drags in a large dependency graph, which is more supply-chain surface. And it churns: the framework patterns you build on today are deprecated advice in three years.

Look at who actually runs high-stakes software in India and has to live with the maintenance. Zerodha, the largest stockbroker in the country, publishes its stack. Go for the throughput-critical services. Vue for web, after giving up on Angular. Flutter for mobile, after giving up on native and React Native. You do not have to copy the recipe. Copy the values: fewer moving parts, self-hosted control, and a willingness to throw out whatever churns.

The boring foundation is server-rendered HTML with progressive enhancement. Build the page so it works as plain HTML over HTTP, then layer on speed and interactivity for browsers that can take it. This is not a step backward to 2008. It is what Hotwire and Turbo do (swap the body for an SPA-like feel, still works with JavaScript off), what Unpoly does, and the philosophy Remix is built on: forms and URLs are the state model.

The rule that makes it real is declaring compatibility tiers up front. Tier 1 is baseline HTML and CSS with minimal JavaScript, and it must be able to complete every critical task on its own: check a result, pay a fee, file a re-evaluation. Tier 2 is the enhanced experience for modern browsers. Tier 3 is the nice-to-haves. If a feature only works in Tier 3, it is not allowed to be the only way to do something that matters.

One caution, because I have been burned by it: do not make a niche platform feature load-bearing. Web components need polyfills on older browsers. Some CSS scoping silently breaks on old Safari. Use the broadly supported thing as the floor, and treat everything fancy as enhancement you can lose without breaking the task.

Three stacked isometric layers in red, orange, and teal displaying abstract UI components.

HTML works. CSS makes it better. JavaScript makes it best. Take the top two away and the task still completes.

3. Make the insecure thing impossible, not discouraged

The CBSE leak, the BHIM exposure, the NACH PDFs: same root cause. Raw object storage is too easy to misconfigure. The answer is not to ban S3.

The providers have already raised the floor. Since April 2023, every new S3 bucket has Block Public Access on and ACLs off by default. Cloudflare R2 is never public unless you explicitly make it so. MinIO, which you can self-host on sovereign hardware, is private until an operator opts out. By default, the open bucket should not happen.

And yet it keeps happening, because a default is not a wall. A careless operator can still flip the toggle, write a policy that is too broad, or paste a credential into a public object. As long as the switch exists where application developers can reach it, someone, someday, on a 66-day deadline, will throw it.

So take the switch away. Move the control point up into one opinionated storage primitive that the whole platform goes through:

Private by default, with no developer-accessible public toggle at all.
Access only through signed, short-lived URLs from typed helpers. No hand-rolled bucket policies.
A separate, reviewer-approved publish workflow for the rare asset that is genuinely meant to be public.
Encryption, malware scanning, retention, and audit logging applied automatically.
Storage tied to application identity and authorization, not to a policy file someone wrote at 2 a.m.

The win is not just developer convenience. It is that there is one narrow, documented, auditable path in and out of storage, instead of N hand-configured buckets each waiting for the next nineteen-year-old to find.

Three buckets, a locked door, and a server list connected by dotted lines.

N buckets, N chances to get it wrong. Or one door, watched, logged, and private by default.

4. Golden paths: make the secure way the lazy way

Notice the pattern across all three: the secure choice is the only easy choice. Platform engineering has a name for this. The “paved road” at Netflix, the “golden path” at Spotify: an opinionated, supported, well-documented way to build and ship, where the platform does not just tell developers the right thing to do. It does it for them.

The experience to aim for is the one Vercel built around Next.js: zero-config deploys, a preview URL for every pull request, infrastructure that understands the framework. The difference is what it runs on. Build it on vendor-neutral, Kubernetes-first foundations, so the government never wakes up locked into one cloud’s proprietary services.

What the golden path hands you for free: standardized secrets, TLS, logging, metrics, and rollbacks. Branch previews. Audit trails baked in. And framework-aware checks that refuse to ship if the Tier 1 no-JS path is broken, so the baseline cannot quietly rot.

And the specific lesson from the meltdown, encoded as infrastructure: isolate the peak-event systems. Results, re-evaluation, scanned-sheet serving, payments and refunds each run as independently scalable services. A results-day surge can hammer the results service without dragging payments down with it. Put a WAF, rate limiting, and DDoS protection in front of every public portal, because 1.5 million hits in two minutes is not a question of if. It is a date on the calendar.

The honest part

This is not free, and I would be doing the exact thing I just criticized if I pretended otherwise.

A custom progressive-enhancement runtime is real, ongoing engineering work. It is only worth it as a shared platform amortized across hundreds of services. If you are building one portal, do not invent a runtime. Use a mature stack that already does this: Hotwire on Rails, Phoenix LiveView, Django or Go with Turbo. Inventing your own framework for a single product is its own kind of cleverness, the kind that bites you.

And the biggest risk is not technical at all. A shared platform with no sustained funding, staffing, or authority does not stay a platform. It rots into an insecure dependency that everyone relies on and no one owns. Central stewardship is load-bearing. Without it, you have just built a more elegant single point of failure.

None of this makes a system unbreakable. It removes whole classes of failure: the public bucket, the coupled load spike, the security bolted on after the fact. It turns the common, catastrophic, embarrassing failures into ones that are structurally hard to commit. That is the most honest thing I can promise.

Make the right thing the only easy thing

Every move here is the same move. Federate, so a breach stays in one cell. Start from HTML, so the page works before the JavaScript does. Remove the public-bucket toggle, so no one can throw it. Pave the road, so the secure deploy is the lazy deploy. You are not asking overworked teams to be more careful. You are building a system where the careful thing is the path of least resistance.

But there is a catch I have been dancing around, and it is the one that actually decides whether any of this gets built. You cannot maintain a platform like this under the procurement model that produced the meltdown: the giant, multi-year, single-vendor RFP. That contract shape is the real root cause, and it needs its own post.

Frequently asked questions

What does a federated government platform look like in practice? One hardened, deeply reviewed core for identity, payments, records, and audit, with agency teams assembling forms and workflows on top in isolated cells. India already runs the model: DIGIT from eGov Foundation is open source, federated, role-based, with PII encrypted by default. A breach in one cell stays in that cell.

Why not build a government portal as a React single-page app? A public portal has to run for a decade on cheap phones and weak signals, maintained by rotating contractors. An SPA reimplements what the browser gives free, drags in supply-chain surface, and churns every few years. Zerodha, India’s largest broker, picked Vue and Flutter for the same reason: fewer moving parts.

How do you actually stop the open-S3-bucket leak from recurring? Defaults already help: since April 2023, every new S3 bucket blocks public access by default. But a default is not a wall; an operator can still flip it. Remove the toggle entirely, routing all storage through one private-by-default primitive with signed, short-lived URLs and a separate reviewed publish workflow.

What is a “golden path” in platform engineering? The paved road at Netflix, the golden path at Spotify: one opinionated, supported, well-documented way to build and ship, where the platform does the secure thing for developers instead of just recommending it. Standardized secrets, TLS, logging, metrics, rollbacks, and audit trails come with the path, not bolted on after a breach.

Is a shared government platform actually worth the cost? Only as shared infrastructure amortized across hundreds of services. For a single portal, use a mature progressive-enhancement stack, Hotwire on Rails, Phoenix LiveView, or Django with Turbo, rather than inventing a runtime. The real risk isn’t technical: a platform with no funding or owner rots into an insecure dependency.

Sources and further reading - DIGIT (eGov Foundation) · Zerodha tech stack · Hotwire / Turbo · S3 Block Public Access on by default (AWS) · Spotify Golden Paths

What’s next?

This is part of a series on why India’s government software keeps falling over, and how I’d build it so it doesn’t:

Stop Fixing India’s Exams — why you can’t patch a high-stakes exam into honesty, and the continuous competency profile that replaces it
BTS of CBSE’s Infra Meltdown — why the result-day “cyberattack” was almost certainly just demand, and why the portal was built to fall over
How I’d Actually Build Government Software That Doesn’t Fall Over (you are here) — a federated core, HTML-first frontends, and storage you can’t make public
GitHub for Government Work — trading the multi-year, single-vendor RFP for an open commons paid per merged pull request
Assume the Endpoint Is Hostile (coming soon) — locking down the lakhs of devices that reach the core, with a closed system on open foundations

One machine, five layers: the exam, the meltdown, the architecture, the procurement, and the endpoint.

How I'd Actually Build Government Software That Doesn't Fall Over

1. Federated, not monolithic

2. HTML is the canonical layer, not an afterthought

3. Make the insecure thing impossible, not discouraged

4. Golden paths: make the secure way the lazy way

The honest part

Make the right thing the only easy thing

Frequently asked questions

What’s next?

Subscribe to my newsletter

Related Posts.

GitHub for Government Work

BTS of CBSE's Infra Meltdown.

Stop Fixing India's Exams