Pillar 6: Verification and Security
Never trust AI output blindly. Review everything. Trust nothing by default.
AI creates dangerous overconfidence. The output looks professional, compiles cleanly, and passes a quick glance. People can get past the initial hurdle and produce something that appears complete, then miss optimizations, security vulnerabilities, and edge cases. They think they are at the finish line when they are barely out of the starting blocks.
The data backs this up. On SWE-bench Pro, top models solve roughly 46% of real-world software engineering tasks. A 2025 analysis of 470 pull requests found AI-generated code carries 1.7x more issues than human-written code, with security vulnerabilities up to 2.7x higher. The gap between “code that compiles” and “code that’s production-ready” is where your expertise earns its keep.
If you cannot read and evaluate what is being generated, you are creating a ticking time bomb. You must understand every line running in production.
What We Expect
Section titled “What We Expect”You understand every piece of code that ships. You can have AI write code, but you must understand what it does. This is non-negotiable. If you cannot explain the logic, the dependencies, and the failure modes, the code is not ready to ship. AI is a collaborator, not a replacement for comprehension.
Research from Checkmarx shows that AI-generated code often receives less careful checking than human-written code, creating serious security risks. The speed at which AI produces code is the structural problem: it generates far faster than you can reason about it, and reviewing output is not the same as truly understanding it.
You never trust AI for security-sensitive decisions. Never assume AI-generated code is secure. Authentication, authorization, input validation, secrets management, and cryptographic operations all require explicit human review by someone who understands the security implications. AI can generate security code, but the review must be human.
You verify against specific failure modes. When reviewing AI output, check for: hallucinated dependencies, deprecated API usage (the AI may have been trained on older versions), hardcoded values that should be configurable, missing error handling and edge cases, and incorrect assumptions about the runtime environment.
Slopsquatting is a real and growing supply chain threat: research shows roughly 20% of LLM code samples recommend non-existent packages, and attackers register those phantom names with malicious payloads. Verify every dependency actually exists and does what you expect. Research in December 2025 uncovered 30+ security flaws in AI coding tools themselves, including Cursor, GitHub Copilot, and Claude Code. The tools you use to write code are themselves an attack surface.
You use adversarial review across models. Different AI models have uncorrelated blind spots. Using a second model to review the first model’s output catches 3-5x more issues than single-pass review. This is not about replacing human review; it’s about making the code that reaches human review already stronger. Have one model generate, another critique, and a human make the final call.
Be aware of the limitations of LLM-as-judge approaches: model judges are vulnerable to adversarial manipulation and struggle when evaluation requires external context. Adversarial review is a filter, not a guarantee.
You use version control aggressively. Commit early and often. AI-assisted development makes rollback points critical because you need clean boundaries when AI-generated changes don’t work out. Never skip source control, and never let AI run unchecked without periodic commits.
You review the AI’s approach, not just its output. Ask the AI to explain its code: “Explain the purpose and step-by-step logic. Include alternative implementations considered and why this choice was made.” This surfaces assumptions and trade-offs that are invisible in the code alone.
Your team has a governance policy for AI-generated code. At the team or organization level, define which AI tools are permitted, what security review is required before merge, and who is accountable when AI-generated code causes an incident. This is not bureaucracy; it’s the same discipline you apply to any third-party code entering your codebase. As regulatory frameworks like the EU AI Act take full effect, organizations without governance policies will face compliance gaps alongside security ones.
Anti-patterns
Section titled “Anti-patterns”- Accepting code that compiles and “looks right” without tracing through the logic
- Using unfamiliar frameworks based on AI recommendation without ability to verify or debug the output
- Letting AI handle security-sensitive code paths without dedicated human review
- Skipping version control because “I can always regenerate it”
- Not specifying framework/library versions, leading to AI generating code against deprecated APIs
- “Proof by AI”: using “the AI suggested it” as justification in a code review
- Installing AI-suggested dependencies without verifying they exist and are maintained (slopsquatting)
- Relying on a single model to both generate and validate its own output
- Having no organizational policy on which AI tools are approved and what review is required
Resources
Section titled “Resources”Benchmarks and Data
Section titled “Benchmarks and Data”- SWE-bench Pro Leaderboard - Real-world pass rates for AI coding models on rigorous software engineering tasks
- CodeRabbit: AI vs. Human Code Quality (2025) - Analysis of 470 PRs showing 1.7x more issues in AI-generated code
Security and Supply Chain
Section titled “Security and Supply Chain”- OWASP Top 10 for LLM Applications - Industry-standard framework for LLM security risks
- Checkmarx: Why AI-Generated Code May Be Less Secure - How AI code bypasses security scrutiny
- 30+ Flaws in AI Coding Tools (December 2025) - CVE-tracked vulnerabilities in Cursor, Copilot, Claude Code, and Zed
- Slopsquatting: AI-Hallucinated Packages (2025) - Supply chain attacks exploiting LLM hallucinations
Adversarial Review and Governance
Section titled “Adversarial Review and Governance”- Multi-Model AI Code Review (2026) - Research showing 3-5x bug detection improvement with cross-model review
- LLM-as-Judge Robustness (2025) - Limitations and vulnerabilities of using LLMs to evaluate LLM output
- 2025 CISO Guide to Securing AI-Generated Code - Governance frameworks for AI coding in enterprise
- Anthropic: Measuring Agent Autonomy - Research on when and how to trust AI agent output
Related Pillars
Section titled “Related Pillars”- Pillar 5: Guardrails and Quality - The automated layer that catches issues before review
- See Learning Paths for deeper dives