How to stop AI from breaking your project

The premise most operators start with is wrong. They want to find the prompt, the setting, the agent configuration that will stop AI from making mistakes. There isn't one. AI coding tools will make mistakes. The question isn't how to prevent the mistakes; it's how to prevent the mistakes from breaking your project.

The shift is from preventing-the-failure to containing-the-failure. Preventing assumes a single layer (the one where the failure originates) can be made reliable enough. Containing accepts that no single layer is reliable enough and uses multiple layers, each catching a different class of failure that the previous layers missed.

This is defense in depth applied to AI coding. The four-layer enforcement framework below is what runs across my own production work. Each layer is small. The compounding effect is dramatic: failures that would have shipped through any single layer get caught by the combination.

Layer one: prompt-level constraints

The first layer is what you ask the AI to do. Most operators stop here, which is why most operators experience constant breakage.

Prompt-level constraints work for the failures that happen because the AI didn't know what you wanted. They don't work for the failures that happen because the AI knew what you wanted and produced wrong code anyway.

What this layer catches:

Specification gaps: features the AI built that weren't what you wanted because you didn't specify them clearly.

Scope confusion: changes the AI made beyond what was asked because the prompt didn't bound the scope.

Pattern misalignment: code in a style or pattern different from your codebase because the prompt didn't reference the existing conventions.

What it doesn't catch:

Bugs in correctly-specified code.

Subtle behavior differences where the AI's output technically matches the spec but doesn't match intent.

Failures that emerge only at integration time when the AI's output interacts with existing code.

Prompt-level constraints are necessary but not sufficient. They're the first layer, not the only layer.

Layer two: scoped permissions

The second layer is what the AI can actually do, mechanically, regardless of what you asked.

In Claude Code, this is the --allowedTools and --disallowedTools flags, the read-only mounts, the explicit working directory restrictions. In Cursor, this is the indexed files and the scope of edit suggestions. In autonomous agents like Devin, this is the workspace boundaries and the operations the agent has permission to invoke.

The principle: the AI shouldn't be able to make changes outside the scope of what was asked, even if it tries. The permission system makes this mechanical rather than promise-based.

What this layer catches:

Cross-cutting changes that the AI thought were helpful but you didn't ask for.

Edits to protected files (database migrations, deployment configs, security-sensitive code) where the consequences of a mistake are severe.

Operations the AI shouldn't be able to perform (running destructive commands, accessing external services, making commits without review).

What it doesn't catch:

Bugs within the scope of work the AI was authorized to do.

Subtle damage from operations that look benign but interact poorly with existing state.

Failures from the AI doing exactly what was authorized in a way that produces wrong outcomes.

Scoped permissions are the layer most operators skip because they take some configuration upfront. The configuration is worth doing once because it then prevents an entire class of damage indefinitely.

Layer three: automated gates

The third layer is what runs against the AI's output before that output is considered done.

This is the test suite, the linter, the type checker, the build verification, the deployment validation, the pre-commit hooks. Each gate runs without operator intervention and catches a specific class of failure.

What this layer catches:

Code that doesn't compile or has type errors.

Code that violates project conventions caught by linters.

Code whose tests don't pass.

Code that breaks the build process.

Code that has known security or quality issues caught by automated scanners.

Code that introduces patterns the codebase has explicitly forbidden (vendor names where they shouldn't be, em-dashes in voice-controlled prose, hardcoded credentials, etc.).

What it doesn't catch:

Bugs in code that compiles, type-checks, and passes existing tests.

Failures that need new tests to catch (the existing test suite is necessarily incomplete).

Surface-vs-semantic mismatches where the surface tests pass but the underlying behavior is wrong.

Logical errors the linter and type checker can't see.

Automated gates are the layer that scales because they run on every change without operator effort. The investment is in setting them up well; the benefit compounds over every subsequent change.

I documented this layer specifically in the four-layer enforcement framework with the production specifics of how each gate is configured on this site.

Layer four: manual review

The fourth layer is human judgment applied to the output that passed the previous three layers.

This is the operator reading the diff, understanding what changed, verifying the change matches intent, checking the things the automated gates can't check. It's the slowest layer and the most expensive. It's also the layer that catches what nothing else can.

What this layer catches:

The surface-vs-semantic mismatches that look correct but aren't.

The architectural drift that's locally reasonable but globally incoherent.

The intent gaps where the code does something different from what you wanted, even though it does something valid.

The judgment calls that AI can't make and tests can't encode.

What it doesn't catch:

What the reviewer doesn't look at carefully (review fatigue is real).

What requires context the reviewer doesn't have (the reviewer needs to understand the codebase).

What manifests later than the review window (issues that only emerge under load, with real data, over time).

Manual review is the most variable layer because it depends on reviewer attention and skill. The discipline is to make it focused (small, scoped changes that can actually be reviewed thoroughly) rather than broad (massive changes that are reviewed superficially because thoroughness would take too long).

Why all four layers

Each layer has gaps. The gaps don't overlap perfectly with the strengths of other layers.

Prompt-level constraints catch what you knew to ask about. Permissions catch what you didn't trust the AI to do. Gates catch what's measurable mechanically. Manual review catches what requires judgment.

A failure that emerged from intent ambiguity gets caught by manual review when prompt constraints failed. A failure that came from a mistake in authorized scope gets caught by gates when permissions allowed the operation but the operation was wrong. A failure in code that compiles cleanly gets caught by manual review when gates were satisfied but the code wasn't right.

The framework doesn't promise zero failures. It promises that failures will be caught at a layer before they reach production. The compounding effect of four layers, each with its own failure modes, is much higher reliability than any single layer can provide.

How to install the framework

You don't have to install all four layers at once. The installation order that produces the most leverage soonest:

Start with layer three (automated gates). Set up the tests, linter, type checker, pre-commit hooks. This is one-time work that pays back on every subsequent change. It also gives you a baseline of what's currently passing versus failing.

Add layer two (scoped permissions) next. Configure your AI tools' permissions to match what you actually want them doing. This prevents an entire class of damage and forces you to think about scope deliberately.

Improve layer one (prompts) as you go. Each prompt that produced wrong output is a chance to improve the next prompt. Build a personal collection of prompts that worked well and patterns that didn't.

Practice layer four (manual review) deliberately. Look at every AI-generated diff before accepting it. Develop the eye for surface-vs-semantic mismatches and architectural drift. This skill compounds.

The whole framework can be installed over a few weeks of deliberate practice. Once installed, it stays in place across projects. The leverage is enormous because the framework prevents failures rather than recovering from them.

This is the same discipline that's documented in how to vibe code a production landing page applied at higher abstraction. The specific tools matter less than the layered structure.

What this looks like in practice

I run this framework on my own work. The layers are: my spec writing (layer one), Claude Code's permission configuration (layer two), the pre-commit hooks plus the recursive validator plus the codified template (layer three), my own review of every diff before promotion (layer four).

Failures still happen. They get caught at one of the four layers before reaching production. The codified template I maintain has now run through dozens of consecutive applications without a missed promotion gate, because the layered defense catches what would have slipped through any single layer.

The framework isn't theoretical. It's what makes AI coding actually viable for production work. Without it, the breakage rate makes the speed advantage irrelevant. With it, the speed is preserved and the breakage stays bounded.


If you're using AI coding tools and want help installing the four-layer framework in your specific workflow, send the tools you're using, the kinds of failures you're seeing, and the project context. VibeKoded can scope a rescue diagnostic, stabilization sprint, or rebuild plan. → Work with VibeKoded