Building an 18-class idle RPG by directing AI agents
I tried to find the ceiling of what you can build by directing AI agents instead of typing the code yourself — and ended up with a genuinely deep browser game.
I set out to find the ceiling of what you can build by directing AI agents rather than typing the code yourself — and ended up with a genuinely deep browser game: Idle Dungeon Crawler. A party of pixel-art heroes auto-crawls an endless procedural dungeon while you manage the village above and program how each hero fights with a little trigger→action rules language. You don't press attack. You write the tactics and watch them play out.
The surprising lesson wasn't about how smart the model is. It was about architecture: the right guardrails turn “AI writes most of the code” from a liability into a fast, safe loop. Here's what the game is, and what made building it this way actually work.
What's in it
The scope crept — happily — far past “idle clicker”:
- 18 hero classes, four with a signature resource system: a Priest builds Devotion toward a free-casting State of Grace; a Gambler banks Luck Coins to rig coin-flip attacks; a Grave Robber stacks Greed for party-wide gold-find; a Summoner fields a menagerie. The Necromancer reanimates the very enemies it just killed.
- 12 ADOM-inspired gods. Convert a hero, tithe for favor, and at the extremes the deity manifests — a rare blessing or a sudden resurrection if you're beloved; a curse, a smiting, or a summoned avatar if you've forsaken it. (One of those avatars is a giant, furious chicken. The hen-goddess does not forgive.)
- 16 races, a morale system (heroes panic and flee, or crit and dodge, depending on their nerve), per-hero economies, a seaside gambling den, and personality traits that drive both banter and behavior — a Greedy hero hoards their gold, a Lone Wolf won't share gear, a Gambling Addict quietly bleeds coin at the tables while idle.
All of it runs on Bun + React 19 + Canvas 2D + Zustand, deploys static to Cloudflare Pages, and every sprite is hand-authored in-repo as a grid of palette characters — zero external art assets.
The thing that made AI-driven dev work: determinism
The core of the game is a pure, deterministic, serializable simulation. No Math.random, no Date.now, no DOM — all randomness flows through seeded RNG streams, the dungeon runs on a fixed 10 Hz integer tick, and the UI can only change state by dispatching commands applied at tick boundaries.
That discipline turned out to be the perfect substrate for agent-driven work, because of one test: a golden-seed hash. A scripted 5,000-tick run is hashed to a committed constant. Any change an agent makes either keeps that hash identical — proving it didn't perturb the simulation — or breaks it on purpose, in which case the same commit re-baselines the constant. An unexpected break means non-determinism crept in, and you hunt it down instead of papering over it.
So when an agent added religion, or a new class, or a gambling den, I could see at a glance whether it touched the existing sim. Most features were provably inert for the baseline party; a few deliberately shifted it. Either way, nothing silently drifted. A grep-style purity test backs it up by failing any PR that sneaks Math.random or a DOM import into the sim layer.
The other multiplier was content-as-JSON. Monsters, items, and sprites are JSON files compiled by a codegen step that validates everything — grid sizes, palette characters, dangling references, spawn coverage — with precise errors. An agent can add a monster by dropping in a file, and the validator catches the mistakes a human reviewer would miss.
How the agents were actually used
I worked in thin slices: one feature per pull request, each with its own tests, a headless balance-harness run, and a deploy. Roughly 30 PRs over the project.
The fun part was multi-agent workflows for the work that parallelizes. The game's heroes banter constantly — taunting enemies by family, reacting to each other — and every class has its own comic voice. Rather than write those dictionaries serially, I fanned out one agent per class to author them in parallel against a shared schema, then validated and merged the results. Same pattern for the personality-trait dialogue, and even for a final documentation audit, where two agents cross-checked the design docs against the shipped code and reported exactly which claims had gone stale.
And the human stays firmly in the loop where it counts. The hardest feature was gambling addiction — an idle hero who auto-gambles their own gold. The naive version is a determinism landmine: how do you simulate “one bet per minute” over wall-clock time without breaking reproducibility or smuggling in offline progress? The fix was an architecture call, not a code-generation one: treat it exactly like the existing chore-accrual system — a fractional time-carry that crosses a fixed interval to resolve one seeded bet at a time, bounded so a hero's purse bleeds gently instead of bursting to zero. That's the kind of decision the agents execute well but don't reach for on their own.
What I took away
Directing agents is genuinely good at breadth — content, tests, parallel authoring, tireless mechanical refactors. What it still needs from a human is the spine: the architectural invariants (a deterministic core, a single mutation funnel per system, a content pipeline that validates itself), the taste calls (“this should be rare — a moment you remember, not a routine”), and the judgment to turn a vague feature into something that won't quietly corrupt the save file three months later.