Teaching AI to Audit Itself

When you use AI to write code, you eventually run into a uncomfortable question: who’s checking the AI’s work? I was generating thousands of lines of code across multiple projects, and while most of it was good, some of it was quietly terrible. Not crash-your-app terrible — more like leave-a-door-unlocked-and-hope-nobody-notices terrible. So I built a system to catch it.

The Trust Problem

AI-generated code has a specific failure mode that human-written code doesn’t. When a human writes bad code, it’s usually because they don’t know better, and the patterns of ignorance are predictable. When AI writes bad code, it often looks perfectly reasonable on the surface. The syntax is clean, the variable names make sense, the logic flows. But underneath, there might be an unvalidated input, an overly permissive file access, or an error handler that silently swallows exceptions.

I started noticing these patterns across my projects. An API endpoint that didn’t sanitize query parameters. A file reader that could be tricked into reading outside its intended directory. A database query assembled with string concatenation instead of parameterized statements. Each one individually minor. Collectively, a minefield.

I needed a systematic way to catch these issues — not manually, because I was generating code faster than I could review it, but automatically, using the same AI that was creating the code in the first place.

Building the Auditor

The Claw Hub Auditor is a Python application that runs inside OrbStack (a lightweight container runtime for macOS). It takes a codebase as input, breaks it into analyzable chunks, and runs each chunk through a series of AI-powered and static analysis checks.

The static analysis layer is straightforward — it catches the obvious stuff. Hardcoded credentials, SQL injection vectors, insecure HTTP calls, missing input validation. These are pattern-matching problems, and tools like Bandit (for Python) and ESLint security plugins (for JavaScript) handle them well.

The AI layer is where it gets interesting. I use Claude to review code with a specific set of security-focused prompts. But here’s the key insight: I don’t ask the AI “is this code secure?” That’s too vague, and the AI will almost always say “looks fine” unless there’s something obviously wrong. Instead, I ask specific, adversarial questions:

“If you were trying to exfiltrate data through this function, how would you do it?”

“What happens if the input to this function is 10MB of garbage?”

“If this error handler fails, what state is the application in?”

Adversarial prompting produces dramatically better results than asking the AI to “review for security issues.” When you ask the AI to think like an attacker, it actually finds things. When you ask it to review code, it writes you a book report.

The Sandboxing Problem

Running untrusted code analysis in a sandbox sounds simple until you actually try to do it. The auditor needs to read the codebase, understand its dependencies, and sometimes execute test cases to verify its findings. But I can’t let it run arbitrary code on my actual machine — that defeats the entire purpose of a security audit.

OrbStack solved this elegantly. Each audit runs in an isolated container with a read-only copy of the codebase, no network access, and strict resource limits. The AI agent inside the container can analyze code, run static analysis tools, and generate reports, but it can’t phone home, modify the source, or escape its sandbox.

The container setup was one of those problems that took ten times longer than I expected. Getting the right Python dependencies installed, making sure the static analysis tools worked in the container environment, handling file permissions correctly — it was all unglamorous infrastructure work. But it’s also the work that makes the difference between a toy project and something you can actually trust.

What the Auditor Found

When I first ran the auditor against my own projects, I expected it to find a few minor issues. It found 47 potential vulnerabilities across six codebases. Most were low severity — things like missing rate limiting on API endpoints or overly permissive CORS headers. But a handful were genuinely concerning: a path traversal vulnerability in a file upload handler, an authentication bypass in a session management module, and a SQL injection vector hiding behind what looked like a perfectly normal ORM call.

The path traversal one was particularly instructive. The code looked like this: it took a filename from user input, appended it to a base directory path, and read the file. The AI that wrote the code had included a comment saying “validates the filename” — but the validation only checked for null bytes, not for directory traversal sequences like ../. The auditor caught it because I asked it to try to read /etc/passwd through the function. It could.

The Meta Problem

There’s an obvious philosophical issue with using AI to audit AI-generated code. If the AI has blind spots in its code generation, won’t it have the same blind spots in its auditing? The answer is: sometimes, yes. But in practice, the auditing context is different enough from the generation context that the blind spots don’t fully overlap.

When the AI generates code, it’s trying to solve a functional problem — make this feature work, process this data, handle this request. Security is secondary. When the AI audits code, security is the primary lens. It’s looking for problems, not building features. That shift in perspective catches a lot of issues that the generation phase missed.

It’s not perfect. The auditor has its own false positives (it once flagged a perfectly safe list comprehension as a “potential denial of service vector”) and false negatives (it missed a timing side-channel that a human reviewer caught). But it’s a net positive, and it catches things that I would absolutely have missed on my own.

What I Learned

The biggest lesson from building the auditor is that AI security isn’t about making AI perfect. It’s about building systems where imperfect AI is still safe. The auditor doesn’t need to catch every vulnerability — it needs to catch enough of them that the remaining risk is manageable.

The second lesson is that the best way to get useful output from AI is to be specific and adversarial in your prompts. “Review this code” produces mediocre results. “Try to break this code in these five specific ways” produces actionable findings.

The third lesson is that infrastructure matters more than intelligence. The auditor’s sandboxing, containerization, and static analysis pipeline are more important than the AI prompts themselves. The AI is the brain, but the infrastructure is the immune system.

I still run the auditor on every project I build with AI. It’s become as routine as running tests. And every few weeks, it finds something that makes me glad I built it.

Michael Eisinger

Michael Eisinger

Program manager, nonprofit founder, and LGBTQ+ travel writer based in Silver Spring, MD. I’ve spent over a decade managing programs across nonprofit, healthcare, and medical education — and another decade finding out where the bears go. I write about travel that’s real, destinations that are genuinely queer-friendly, and the places that changed how I see things.