Skip to content

Pillar 11: Knowing When NOT to Use AI

The best AI developers know when to put it down.

Every other pillar in this repository teaches you how to use AI well. This one teaches you when to stop. AI coding assistants create a gravitational pull toward using them for everything, and the beginner mindset treats AI like a hammer that makes everything look like a nail.

The real risk is not the obvious hallucinations that crash at compile time. As Simon Willison argues, the dangerous mistakes are the ones that are not instantly caught by the compiler or interpreter: subtle logic errors, security oversights, and architectural decisions that look reasonable on the surface but create compounding problems. IEEE Spectrum reported that AI coding quality has plateaued and in some cases declined, with “silent failures” emerging as the dominant risk category. The code runs. The tests pass. And the bug ships.

The data is sobering. A 2025 analysis of 470 pull requests found AI-generated code carries 1.7x more issues than human-written code, with logic errors 75% more common, error handling gaps 2x more frequent, and security vulnerabilities up to 2.7x higher. A large-scale study from arXiv (August 2025) confirmed that human-written code remains superior across every quality metric measured, despite being structurally more complex. Knowing when AI hurts more than it helps is a skill that separates experienced practitioners from beginners.

You recognize the task categories where AI consistently underperforms. AI struggles with: novel algorithms that require deep mathematical reasoning, security-critical code paths where subtle errors have outsized consequences, complex multi-system integrations where the AI cannot see the full picture, performance-sensitive code where naive implementations carry hidden costs, and domain-specific logic where the AI lacks training data.

MIT research mapped the specific roadblocks: AI fails at large codebases (millions of lines), struggles with global architectural coherence while generating locally correct code, and hallucinates code that looks plausible but violates internal conventions. A Microsoft senior engineer categorized the common failure types: code that does not compile, code that is overly convoluted, functions that contradict themselves, and hallucinations that make up nonexistent functions. When you encounter these patterns, slow down, write more of the code yourself, and use AI for specific sub-problems where you can verify the output.

You do not vibe code into production. “Vibe coding” (letting AI generate code you accept without fully understanding) has a place for throwaway prototypes and weekend experiments. Andrej Karpathy, who coined the term, said as much. It does not belong in production codebases.

Research from Kaspersky found that 45% of AI-generated code contains classic OWASP Top-10 vulnerabilities, and security deteriorates with iteration: after five modification rounds, code has 37% more critical vulnerabilities than it started with. Qodo’s 2025 research found that while 71% of developers say they won’t merge AI code without manual review, a significant portion of junior developers still deploy AI-generated code they do not fully understand. That is a liability, not a productivity gain. If you cannot explain what the code does, why it does it, and how it fails, it is not ready to ship.

You stop and reframe when the AI is struggling. If the AI takes too many iterations, produces contradictory outputs, or keeps regressing to the same incorrect pattern, that is a signal. The Axur engineering team’s recommendation holds: if the AI assistant takes too long or struggles with a complex prompt, stop it and reframe the problem. Break it into smaller pieces, provide more context, or switch to a different approach. Stubbornly iterating on a failing prompt is a time sink.

You are alert to the “looks right, is wrong” failure mode. AI-generated code that compiles and passes basic testing can still contain: functions that contradict themselves, overly convoluted implementations of simple problems, references to non-existent packages (slopsquatting), deprecated API patterns that work today but will break, and security vulnerabilities disguised in plausible-looking code.

A USENIX 2025 study testing 16 popular LLMs found that roughly 20% of AI-generated code references non-existent packages, with 43% of those hallucinated names repeating consistently across runs. The code looks correct. The imports look standard. The package names sound plausible. And none of it is real. Performance inefficiencies appear 8x more often in AI-generated code than human-written code, alongside logic errors, misconfigurations, and unsafe control flow.

You guard against automation bias and over-reliance. Automation bias is the documented tendency to favor AI recommendations even when contradictory evidence is present. Thoughtworks added complacency with AI-generated code to their Technology Radar as a recognized risk, noting that AI-driven confidence often comes at the expense of critical thinking, with automation bias, anchoring bias, and review fatigue all contributing.

In coding, this manifests as accepting AI output without tracing through the logic, deferring architectural decisions to the model, and losing the habit of critical evaluation. The METR study captured the perception gap perfectly: experienced developers believed AI made them 20% faster while measured outcomes showed they were 19% slower. You cannot trust your intuition about whether AI is helping. You need to measure it.

You maintain the ability to work without AI. If your productivity drops to near zero when your AI tool has an outage, that is a warning sign. You should be able to read code, debug, reason about architecture, and write implementations without AI assistance. AI is an accelerant, not a crutch.

Anthropic’s own research (2026) found developers who delegated code generation to AI scored 17% lower on comprehension tests. Research from MIT Media Lab found that prolonged AI use led to measurable declines in memory and weaker neural connectivity patterns compared to unassisted work. ICIS 2025 research confirmed that developer expertise is the primary factor mitigating hallucination impact, because experts have the baseline knowledge to catch mistakes that less experienced developers miss. Maintaining your fundamentals is not nostalgia; it is the safety net that makes AI collaboration viable.

  • Using AI for every task regardless of whether it is a good fit
  • Vibe coding into production: accepting AI output you cannot explain or debug
  • Continuing to iterate on a failing AI interaction instead of stepping back and reframing
  • Trusting AI-generated security code without dedicated expert review
  • Not having a plan B when your AI tool is down or producing poor results
  • Letting AI choose frameworks, libraries, or architectures for domains you do not understand well enough to evaluate the choice
  • Assuming AI makes you faster without measuring actual outcomes (the METR perception gap)
  • Deploying AI-generated code that passes tests but has never been read by a human who understands the business logic