Pillar 6: Verification and Security

Never trust AI output blindly. Review every line; trust nothing by default.

Never trust AI output blindly. Review every line; trust nothing by default.

AI creates dangerous overconfidence. The output looks professional, compiles cleanly, and passes a quick glance. People can get past the initial hurdle and produce something that appears complete, then miss optimizations, security vulnerabilities, and edge cases. They think they are at the finish line when they are barely out of the starting blocks.

The data backs this up. On the SWE-bench Pro Q2 2026 leaderboard, top models solve roughly 50-60% of real-world software engineering tasks. A 2025 analysis of 470 pull requests found AI-generated code carries 1.7x more issues than human-written code, with security vulnerabilities up to 2.7x higher. The gap between "code that compiles" and "code that's production-ready" is where your expertise earns its keep.

If you cannot read and evaluate what is being generated, you are creating a ticking time bomb. You must understand every line running in production.

What We Expect

You understand every piece of code that ships

You can have AI write code, but you must understand what it does. This is non-negotiable. If you cannot explain the logic, the dependencies, and the failure modes, the code is not ready to ship. AI is a collaborator, not a replacement for comprehension.

Research from Checkmarx shows that AI-generated code often receives less careful checking than human-written code, creating serious security risks. The speed at which AI produces code is the structural problem: it generates far faster than you can reason about it, and reviewing output is not the same as truly understanding it.

You never trust AI for security-sensitive decisions

Never assume AI-generated code is secure. Authentication, authorization, input validation, secrets management, and cryptographic operations all require explicit human review by someone who understands the security implications. AI can generate security code, but the review must be human.

You verify against specific failure modes

When reviewing AI output, check for: hallucinated dependencies, deprecated API usage (the AI may have been trained on older versions), hardcoded values that should be configurable, missing error handling and edge cases, and incorrect assumptions about the runtime environment.

Slopsquatting is a real and growing supply chain threat: research shows roughly 20% of LLM code samples recommend non-existent packages, and attackers register those phantom names with malicious payloads. Verify every dependency actually exists and does what you expect. The tools you use to write code are themselves an attack surface; multiple AI coding tools have shipped exploitable vulnerabilities, with disclosure aggregations now tracked in the resources below.

You use adversarial review across models

Different AI models have uncorrelated blind spots. Using a second model to review the first model's output catches substantially more issues than single-pass review (one industry study reports a 3-5x improvement, though this comes from a vendor blog and we have not seen an independent reproduction). This is not about replacing human review; it's about making the code that reaches human review already stronger. Have one model generate, another critique, and a human make the final call.

Be aware of the limitations of LLM-as-judge approaches: model judges are vulnerable to adversarial manipulation and struggle when evaluation requires external context. Adversarial review is a filter, not a guarantee.

You use version control aggressively

Commit early and often. AI-assisted development makes rollback points critical because you need clean boundaries when AI-generated changes don't work out. Never skip source control, and never let AI run unchecked without periodic commits.

You review the AI's approach, not just its output

Ask the AI to explain its code: "Explain the purpose and step-by-step logic. Include alternative implementations considered and why this choice was made." This surfaces assumptions and trade-offs that are invisible in the code alone.

Your team has a governance policy for AI-generated code

This is not bureaucracy; it's the same discipline you apply to any third-party code entering your codebase. As regulatory frameworks like the EU AI Act take full effect, organizations without governance policies will face compliance gaps alongside security ones.

A workable AI governance policy should cover at minimum:

Approved tools and models. Which IDEs, CLIs, agents, and providers are allowed; which require approval; which are prohibited.
Data classes. What data may be shared with which tools (public, internal, customer or client, regulated, secret).
Retention and training settings. Which providers may train on prompts; what retention is acceptable; what must be disabled per contract.
Client and regulated-work restrictions. Default rules for code, logs, and telemetry that originated from a client engagement (see Pillar 10).
Required human review categories. Which code paths require dedicated security review, dedicated privacy review, or named-owner sign-off before merge.
Logging and observability. What AI activity is logged, where, for how long, who can read it.
Incident response. Named owner, escalation path, rollback procedure when AI-generated code causes an incident.
Accountability. Who is on the hook when AI-generated code ships and breaks.

Cross-check against NIST SSDF for the secure-development hooks and EU AI Act obligations for any work touching the EU market.

Anti-patterns

Accepting code that compiles and "looks right" without tracing through the logic
Using unfamiliar frameworks based on AI recommendation without ability to verify or debug the output
Letting AI handle security-sensitive code paths without dedicated human review
Skipping version control because "I can always regenerate it"
Not specifying framework/library versions, leading to AI generating code against deprecated APIs
"Proof by AI": using "the AI suggested it" as justification in a code review
Installing AI-suggested dependencies without verifying they exist and are maintained (slopsquatting)
Relying on a single model to both generate and validate its own output
Having no organizational policy on which AI tools are approved and what review is required

Resources

Benchmarks and Data

SWE-bench Pro Leaderboard - Real-world pass rates for AI coding models on rigorous software engineering tasks
CodeRabbit: AI vs. Human Code Quality (2025) - Analysis of 470 PRs showing 1.7x more issues in AI-generated code

Security and Supply Chain

OWASP Top 10 for LLM Applications - Industry-standard framework for LLM security risks
Checkmarx: Why AI-Generated Code May Be Less Secure - How AI code bypasses security scrutiny
IDEsaster CVE Aggregation - Educational/research aggregation of the 30+ CVEs disclosed across Cursor, Copilot, Claude Code, MCP Inspector, and other AI IDEs in December 2025
30+ Flaws in AI Coding Tools (December 2025) - The Hacker News writeup of the IDEsaster disclosure
Slopsquatting: AI-Hallucinated Packages (2025) - Supply chain attacks exploiting LLM hallucinations

Adversarial Review and Governance

Multi-Model AI Code Review (2026) - Research showing 3-5x bug detection improvement with cross-model review
LLM-as-Judge Robustness (2025) - Limitations and vulnerabilities of using LLMs to evaluate LLM output
2025 CISO Guide to Securing AI-Generated Code - Governance frameworks for AI coding in enterprise
Anthropic: Measuring Agent Autonomy - Research on when and how to trust AI agent output

Pillar 5: Guardrails and Quality - The automated layer that catches issues before review
See Learning Paths for deeper dives

Pillar 6: Verification and Security

What We Expect

You understand every piece of code that ships

You never trust AI for security-sensitive decisions

You verify against specific failure modes

You use adversarial review across models

You use version control aggressively

You review the AI's approach, not just its output

Your team has a governance policy for AI-generated code

Anti-patterns

Resources

Benchmarks and Data

Security and Supply Chain

Adversarial Review and Governance

Pillars

Toolchain

Resources

Pillar 6: Verification and Security

What We Expect

You understand every piece of code that ships

You never trust AI for security-sensitive decisions

You verify against specific failure modes

You use adversarial review across models

You use version control aggressively

You review the AI's approach, not just its output

Your team has a governance policy for AI-generated code

Anti-patterns

Resources

Benchmarks and Data

Security and Supply Chain

Adversarial Review and Governance

Related Pillars

Pillars

Toolchain

Resources