How to vibe-code a production landing page

What vibe coding means depends on who's using the term.

The casual version is: type some prompts at an AI tool, accept whatever it produces, deploy. That works for prototypes. It does not produce software you can ship to clients or stand behind a year from now.

The version that produces production software uses the same AI tools differently. It treats them as a team of staff, each with a role, each governed by specifications written before they generate anything. The operator (you) directs the team. The team executes. The operator confirms the work is correct through mechanical gates before it ships.

This guide walks through the configuration and methodology I used to ship vibekoded.com, a production landing page with 8 cinematic scenes, a 3D-mesh content engine, server-rendered structured data, and a four-layer voice and code enforcement system. I'm not a developer. I don't write the code. I direct it.

The methodology has a name. Specification-mesh, or SpecMesh for short. Everything below is how it works in practice on a real build.

The team

The configuration I use day-to-day has three agent roles and one operator. The roles aren't bound to specific tools; the tools I name below are what I happen to use, not the only options that fit each role.

The builder. The agent I delegate code generation to. It runs in a terminal, takes specifications and produces code, runs tests, measures performance, creates halt markers when it finishes a leg of work. The specific tool I use for this is Claude Code. When it finishes a leg, the response back to me looks like this:

Marker cleared. Capturing the SPEC (no code; halt at SPEC- ratification gate). Five forks plus surfaced load-bearing items require operator decision.

Claude Code, SPEC capture, build log

The builder is forbidden from clearing its own halt markers. That matters. More on this further down.

The researcher. A separate agent I use for exploration when I need to understand unfamiliar territory before writing a SPEC. The researcher surfaces multiple approaches and the tradeoffs between them. It doesn't implement. Its job is to expand my decision space before I commit to a direction.

The adversarial reviewer. A third agent I use to stress-test SPECs and implementations. "What breaks this?" When a SPEC looks clean, the adversarial reviewer asks what cases I haven't considered. When an implementation looks complete, it asks what edge cases the gate missed.

The pattern in practice is: research, then specify, then build, then adversarially review, then ship. Each role has its strengths. The operator integrates them. The methodology below is mostly about how the operator and the builder interact, because that's the loop that runs most often and produces most of the output.

The spec discipline

Every meaningful build decision goes through a specification first. The discipline is simple: capture before generate. The builder does not write code until the operator has ratified the SPEC.

A working SPEC captures six things.

WHAT is being built. The actual deliverable, in one or two sentences.

WHY this matters. The value. If the WHY is thin, the work isn't worth doing yet.

INPUTS. What existing surface or data is in scope. Names the files, the routes, the constraints the new work has to respect.

OUTPUTS. What will exist after the leg completes. Specific files, specific behaviors.

INVARIANTS. What cannot change or regress. These are non-negotiable. The three-leg gate verifies them at close.

EDGE CASES. What could go wrong. The places where assumptions don't hold. Operator decisions surfaced as "forks" the operator ratifies before implementation begins.

For a guide piece like this post, that might look thin. For real software, it's a forcing function. The SPEC capture is where I surface decisions I didn't know I had to make. The builder reads the WHAT and WHY, then asks: "you didn't say what to do when X happens. Fork or invariant?" Most builds fail because the operator didn't think through the case the agent then has to guess at. The code itself is usually correct. The unspecified assumption is what breaks.

The SPEC also catches load-bearing items the operator missed. Three consecutive amendments I shipped during the vibekoded.com build each had a load-bearing invariant surface during SPEC capture that I'd missed when I wrote the delegation. The methodology converges on catch-at-capture rather than catch-at-implementation, which is genuinely cheaper. A wrong assumption caught at SPEC time costs a five-minute conversation. The same wrong assumption caught at implementation time costs a leg of work plus the rework.

The three-leg gate

After SPEC ratification and implementation, every leg of work passes through a three-leg gate before close.

Structure leg. Lint, build, typecheck. All must return zero issues. This catches mechanical errors before any other measurement runs. If the code doesn't compile, nothing else matters.

Functional leg. The actual work must do what it claims to do. Surface signals are not enough. The builder runs semantic measurement: not "did the click navigate?" but "did the destination render correctly?" The distinction is load-bearing. Surface signal proposes; measurement disposes. Every functional gate runs semantic probes, not just surface checks.

Perf leg. Performance metrics measured against pinned baselines, median-of-N runs to handle measurement noise. Mobile and desktop both. Performance never gets to drift silently.

The previous post on this blog documents two bugs the three-leg gate caught in one session that would have shipped silently otherwise. Both were Phase-1-class issues. The gate caught them at the right layer, in the right order, by measurement.

The semantic-versus-surface principle came out of an earlier moment in the build. A hash-based check, a quick grep against a build output, reported pass. The actual rendered surface was broken. A better grep wouldn't have caught it. The principle that emerged was: every gate runs semantic measurement, not just surface signals. That principle has caught at least one Phase-1-class issue at every leg of every amendment I've shipped since.

The halt-marker protocol

Every leg of work ends with a halt marker, a file in the repo root that the builder creates, named for the leg, that mechanically blocks any further commits until the operator removes it.

The mechanism: a pre-commit hook scans the repo root before every git commit. If any file matching the halt-marker pattern exists, the commit fails. The hook is committed to version control; the markers themselves are gitignored. The markers live only on local disk as roadblocks.

The operator visually confirms the work matches the claim. Reads the rendered surface. Clicks the things. Looks at the perf numbers. Reads the audit-trail entry. If everything matches, the operator runs:

rm HALT-leg-name-impl-review.md

The builder is forbidden from clearing its own halt markers. The pre-commit hook refuses any commit while a halt marker exists. The mechanism enforces mechanically without depending on either of us remembering the methodology rules.

A useful analogy: masonry inspection stamps. The contractor pours the foundation. They cannot pour the next floor without an inspector physically stamping the permit. The contractor is forbidden from stamping their own work. The stamp is the gesture. The work stops mechanically until it's there. The pre-commit hook is the building code that prevents pouring the next floor without the stamp. The methodology rule that the agent can't clear its own markers is the building code that prevents the contractor from forging the inspector's signature.

The most recent post on this blog walks through how this protocol caught a React 19 and GSAP DOM race that the surface signal said worked. Without the gate, the navigation amendment would have shipped a visibly-present-but- actually-broken destination page.

This protocol came out of an earlier phase of the build where a leg shipped claimed-complete and turned out to have a wheel-scroll-dead defect. The methodology needed more than caution from the builder or more attention from me. The methodology needed a physical handshake in the filesystem: a file ceases to exist on disk because the operator typed the command. The builder cannot fake that gesture.

The voice layer

This blog has voice rules. No em dashes. No fictional claims about awards or being-the-best at anything. No identity references. Vendor names allowed in body via named-excerpt blockquote only, never in titles, descriptions, slugs, categories, or structured data.

These are enforced by greps in a pre-commit hook. Try to commit a post with an em dash, the commit fails. Try to put a vendor name in a slug, the commit fails. The rules enforce mechanically. They don't depend on me remembering or the builder restraining.

This is the fourth layer of an enforcement framework I rely on.

The voice greps catch voice violations mechanically.

The three-leg gate catches functional and perf regressions.

The semantic-versus-surface principle catches lying-surface-signals.

The halt-marker protocol catches premature-complete declarations.

Each layer catches a different class of failure. None depend on me being disciplined in the moment. They all enforce mechanically. The methodology survives forgetting, fatigue, and ambition.

The four layers compound. A piece of work has to pass all four to ship. The greps fire on commit. The gate fires on leg close. The semantic principle is baked into every functional probe. The halt marker holds everything until the operator confirms. By the time anything makes it through, it's been audited by multiple independent checks, none of which trust the other to have caught the failure.

What this looks like in practice

This blog is the working example of the methodology. Every post on it documents a specific moment where the methodology caught something that would have shipped broken otherwise. The build of the blog itself produced four V1.1 stub specs for issues identified during the work but deliberately not fixed in scope. Each stub captures what's owed, why it was deferred, and what would change if the deferral were revisited. The audit trail records what was tried, what worked, what was deferred, and why.

The first post on this blog is a complete teardown of one cycle where the methodology pulled itself back from a leg that had over-scoped. Worth reading if you want to see the gate working on a real defect. The posts after it document the methodology continuing to catch Phase-1-class issues at the gate, leg after leg.

You can read the worked examples in any order. They're written to stand alone but reinforce each other.

The honest version

This methodology is not faster than writing the code yourself if you can already do that. A senior front-end developer could have shipped the vibekoded.com landing page in two or three days. I shipped it over multiple weeks. The tradeoff is real.

What I get for the tradeoff is software I can stand behind a year from now without remembering how it works. The SPECs document every decision. The audit-trail entries document why every decision was made. The V1.1 stubs document what's still owed. A new operator, or me with no recent context, can pick up this codebase and understand the architecture in an afternoon.

The other thing I get is that I shipped it. Without this methodology, I do not write the code. I also don't hire a developer. The portfolio piece does not exist. Slower-than-an-expert-could-do- it is not the right comparison. The right comparison is shipped- versus-not-shipped.

If you can write the code yourself faster than you can write a spec and direct an agent, write the code yourself. This methodology is for people who can't, won't, or shouldn't. It produces real software in their hands.

Who this is for

People who want production software but don't write code. The methodology gives you a way to direct AI tools that doesn't reduce to "type and hope." You stay the operator. You don't become an accidental engineer.

Technical operators who want to multiply themselves with AI. The roles configuration scales. One operator can run a team of agents the same way one director can run a film crew. The discipline is what makes the multiplication compound rather than collapse.

People tired of "vibe coding" meaning "AI hopes for you." This is the version of vibe coding that ships software you can defend in a code review or in front of users. The energy is casual on the surface. The substance underneath is not.

Start with a SPEC. Direct your team. Confirm the gates. Ship.


If you have an idea for a production website and want it built through AI orchestration without the prototype-collapse problem, I can help. Send the rough idea, the audience, and any reference sites that hold up against what you want. VibeKoded can scope the build, ship the prototype, or hand off the production site. → Work with VibeKoded