The audit log that trusted the proxy

2026.06.26// field-notes6 min read

I found the bug in the audit log, which stung, because the audit log was the part I had built to catch bugs. It looked complete. Every request was there: timestamped, with an IP address attached. And every single IP was identical. Thousands of requests, from a few hundred different people, all stamped with one address. The log had been writing down the same wrong answer, confidently, on every line, for months.

the part that passed every test

The app was a small custom tool I vibe coded for a handful of real users. Nothing exotic. It had the basics any app that touches real users needs once it leaves your laptop: rate limiting, so a single client can't hammer the endpoint, and an audit trail, so I can answer "who did this" after the fact. Both features key off the client's IP address. The rate limiter buckets requests by IP. The audit log stamps each action with the IP it came from.

In development this worked perfectly. I hit the endpoint from my machine, the log showed my address, the rate limiter counted my calls and cut me off at the limit. I wrote a couple of tests around it, they passed, I shipped it. The feature was done in every sense a test could measure.

the proxy in the middle

The tests couldn't see the one thing that mattered. In production the app doesn't sit on the open internet. It sits behind a reverse proxy, the way almost every deployed app does now, vibe coded or not. The proxy accepts the real connection from the user, then opens its own connection to the app. So when my code asked the network layer "what address is this request from," the honest answer was: the proxy. Every time. The user's real address was sitting in a header the proxy adds, the forwarded-for header, and my code never read it. It read the socket, got the proxy, wrote that down, and moved on.

So the rate limiter was sorting every user on the planet into one bucket: the proxy. That merger breaks the limiter in two directions, both bad. Either the limit never trips, because no single real user generates enough traffic to matter once everyone is merged into one identity, or it trips for all of them at once and throttles the whole app the moment that merged identity crosses the line. Mine was the first kind, which is the quieter kind. The rate limiter was running. It was returning numbers. It was protecting nothing.

The audit trail had the same disease. It looked like a security feature. It was a list of actions attributed to an identity that was always wrong. If one of those users had done something I needed to trace, the log would have pointed at the proxy, which is to say it pointed at all of them and none of them. A vibe coded app can pass every functional check and still be wide open, and this is the exact shape it takes: not a crash, not an error, a feature that runs and reports success while quietly doing the opposite of its job.

verify in the real environment

The principle I keep relearning, and the one this cost me a few months of useless logs to relearn again, is verify in the real environment. Not the environment that's convenient to test in. The one the code actually runs in.

This bug survived because my test environment was missing the single component that defined the failure: the proxy. On my machine there was no middle layer, so the socket address was the user, so the code looked correct. The test wasn't wrong. It was answering a question about a world that didn't exist in production. Every assumption the code made about where a request came from was true in dev and false in prod, and no green test run will ever tell you that, because the test runs in the world where the assumption holds.

This is the part vibe coding makes easier to miss. When you orchestrate an agent to build a feature like rate limiting, it writes the textbook version, and the textbook version reads the socket address, because the textbook isn't sitting behind your proxy. It runs. It passes. It looks like the feature you asked for. Whether it survives contact with your actual deployment topology is a separate question, and the generator has no idea what your topology is. You have to go and check, in the place it really runs. Passing tests doesn't mean the app is fixed, and it doesn't mean the app is safe either. A test measures the code against the world it was tested in, nothing more.

What caught it, in the end, was looking at the real thing. I opened the production audit log to answer an unrelated question and saw the same IP stacked a few thousand times. That's the whole tell. If I had ever pulled up that log against real traffic, instead of trusting that "it logs IPs, good enough," I'd have seen it in the first week. The fix took ten minutes: read the forwarded-for header, trust only the hop you control, fall back to the socket if the header is absent. The expensive part was never the fix. It was the gap between shipping the feature and looking at it in the environment that counted.

what to actually do

If you vibe code apps that handle real users, the move is to stop testing the security-relevant paths only where they're easy to test. Anything that depends on where a request came from, who a user is, or what the real origin was will behave differently behind a proxy, a CDN, or a load balancer, and your laptop has none of those in front of it.

Three things are worth doing before you trust an app like this with real traffic. First, exercise the security features against the deployed environment, not just localhost: hit the rate limiter from two genuinely different networks and confirm it counts them separately. Second, read your own audit log against real traffic at least once, with your own eyes, before you need it in an emergency. A log you have never actually read is a guess wearing a uniform. Third, when you ask an agent to build anything that reads request metadata, tell it your deployment shape up front, because auditing the generated code for this class of assumption is far faster than reconstructing it later from a log full of one address.

If you're a vibe coder shipping an app that real people log into, and you want someone to pressure-test where it's quietly trusting the wrong thing, /work-with-us. Send me the app and where it's deployed, and I can scope an audit pass that checks the security features against the environment they actually run in, or a spec-first rebuild that writes the deployment assumptions down before the code makes them silently. Work with VibeKoded.

The audit log is honest now. Different requests show different addresses, which sounds like nothing until you've watched the alternative scroll past, identical, for months. The feature was never the problem. The problem was that I had checked it everywhere except the one place it had to be true.

the part that passed every test

the proxy in the middle

verify in the real environment

what to actually do

// see also

How to audit AI-generated code before you ship

Why passing tests does not mean the app is fixed

One decision, traced end to end: the I-AUTOSHIP invariant