What AI Tools Get Wrong (and What to Do About It)

I've built 9 apps with heavy AI assistance. The tools are genuinely good. They've made me faster, caught bugs I would have missed, and generated code I wouldn't have thought to write. But they also fail in predictable ways, and if you don't know the patterns, you'll burn hours on problems the AI created for you.

This isn't a "gotcha" list. These are failure modes I hit regularly, across multiple tools and models, and the specific things I do to prevent them.

Confident Descriptions of Code That Doesn't Exist

This is the big one. You ask "what does the content loader do?" and the AI gives you a detailed, plausible explanation of code it hasn't read. It sounds right. The function names are reasonable. The logic makes sense. And none of it matches what's actually in the file.

I call this the "sounds right" problem. The AI is pattern-matching against what content loaders typically do, not what yours actually does. If your implementation is unusual in any way (and most real code is unusual in some way), the explanation will be wrong.

The fix: One rule in my CLAUDE.md eliminated about 80% of this: "Never speculate about code you have not opened. If the user references a specific file, read the file before answering." Simple, blunt, effective. The AI now reads the file first, every time. The explanations got longer (it quotes actual code) and accurate.

This rule costs almost nothing. An extra file read takes a second. But without it, I was regularly catching the AI mid-hallucination and having to say "did you actually read that file?" The answer was usually no.

Over-Engineering Everything

Left to its defaults, an AI coding tool will build you a cathedral when you asked for a shed.

Ask for a function that formats a date? You'll get a configurable date formatting utility with locale support, timezone handling, a fallback chain, and an options object with six parameters. The function you needed was three lines.

Ask for error handling? You'll get a custom error class hierarchy, a centralized error boundary, retry logic with exponential backoff, and a logging pipeline. You needed a try/catch.

The pattern is consistent: AI tools optimize for completeness, not simplicity. They're trained on codebases where the robust solution was the right answer. But in a new project, in a first sprint, the robust solution is over-engineering. You don't need retry logic when you haven't proven the happy path works yet.

The fix: My CLAUDE.md has a blunt rule: "Make every task and code change as simple as possible. Avoid massive or complex changes. Every change should impact as little code as possible. Everything is about simplicity."

I also added: "Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries."

These rules don't make the AI write bad code. They make it write appropriate code. The difference matters.

The Twelve-File Refactor

Related to over-engineering, but worse: you ask for one change and the AI touches twelve files. It refactors the surrounding code "while it's in there." It adds type annotations to functions you didn't ask about. It renames variables for consistency. It improves the imports.

Each individual change might be fine. But the diff is now 200 lines across a dozen files when it should have been 15 lines in two files. Good luck reviewing that. Good luck reverting it if something breaks.

The fix: "Incremental development" rules. Every change should result in code that compiles and runs. Build in small increments. Test before moving on. And the simplicity rules above: impact as little code as possible, don't add improvements beyond what was asked.

When the AI does hand me a massive diff, I reject it and ask for just the change I requested. This happens less now with the CLAUDE.md rules, but it still happens. The instinct to "improve" adjacent code is deeply baked in.

Suggesting Decisions You Already Made

"Have you considered using Vercel instead of Cloudflare Pages?"

Yes. I considered it. I decided. I moved on. The decision is documented. But the AI will periodically re-litigate settled decisions, especially if it thinks there's a "better" option. Tech stack choices, library selections, architectural patterns. All fair game for unsolicited suggestions.

This is annoying in the moment and dangerous over time. If the AI keeps suggesting alternatives, you start second-guessing decisions that were correct. Decision fatigue is real, and an AI that constantly reopens closed questions contributes to it.

The fix: My project CLAUDE.md has a section called "Key Decisions" with "Do Not Revisit" in the heading. It lists every major decision (tech stack, hosting, content approach, deployment model) as settled facts, not open questions.

## Key Decisions (Do Not Revisit)
 
- **Tech stack:** React 19 + React Router v7 + Vite + Tailwind CSS v4
- **Hosting:** Cloudflare Pages (static, free tier)
- **Content:** MDX files compiled via @mdx-js/rollup
- **Backend:** None — fully static site

The AI reads this and stops suggesting alternatives. Simple, but it required me to realize the problem was happening and write the rule. The first few projects didn't have this section, and I wasted real time re-evaluating decisions that didn't need re-evaluation.

Losing Track of What Works

This one is subtle. The AI fixes a bug, and in the process, it changes something that was working correctly. Not maliciously. It just doesn't know that the existing code was intentional.

Real example: this site had a hydration mismatch fix where all toLocaleDateString calls use timeZone: "UTC". I'd debugged this, found the root cause, and fixed it across the codebase. A few sessions later, the AI wrote a new date formatting call without the UTC timezone. The bug came back.

The fix: The "Key Patterns" section in CLAUDE.md. Every hard-won fix gets documented there:

## Key Patterns
 
- **Date formatting:** All `toLocaleDateString` calls use
  `timeZone: "UTC"` to prevent hydration mismatch
- **FadeIn hydration safety:** FadeIn component skips animation
  for elements already in viewport at mount time
- **Theme icon deferred:** Header uses `mounted` state to avoid
  hydration mismatch on theme toggle icon

These aren't code comments (though comments help too). They're in the CLAUDE.md so the AI sees them at the start of every session, before it writes any code. It's the difference between "here's a rule in a file you might read" and "here's a rule you will definitely read."

When to Ignore the AI

Not every suggestion needs a rule to fix it. Sometimes the right move is to just say no.

The AI suggests adding TypeScript strict mode to a file that works fine? No. It wants to refactor a component into three smaller components? Not unless I asked. It recommends adding a testing library you haven't used? Not mid-sprint.

The judgment call is: does this suggestion serve the current task, or is it the AI optimizing for some abstract notion of code quality? If it's the latter, ignore it and move on. You can always come back to it later when you're not in the middle of building something.

The pattern I've landed on: trust the AI for execution within clear boundaries, but make the boundaries yourself. The CLAUDE.md is the boundary document. The sprint scope is the boundary. The architecture decisions are the boundary. Inside those boundaries, the AI is excellent. Outside them, it wanders.

The Correction Flywheel

Every failure mode above has the same solution structure:

Notice the pattern
Write a rule in CLAUDE.md
The AI follows the rule
The problem mostly stops

"Mostly" because no rule is perfect and new failure modes emerge as you tackle new kinds of work. But the rate of correction compounds. By the third or fourth project, my CLAUDE.md had enough rules that new projects started with dramatically fewer problems. The failures from Project 1 don't happen in Project 9.

The key insight is that correcting AI tools is a one-time cost per failure mode. Correcting a human colleague is an ongoing conversation. Write the rule once, and every future session follows it. That's the real leverage of working with AI. Not that it's perfect, but that its mistakes are fixable in a way that scales.