Skip to content

Pillar 6: Verification and Security

Never trust AI output blindly. Review every line; trust nothing by default.

Never trust AI output blindly. Review every line; trust nothing by default.

AI creates dangerous overconfidence. The output looks professional, compiles cleanly, and passes a quick glance. People can get past the initial hurdle and produce something that appears complete, then miss optimizations, security vulnerabilities, and edge cases. They think they are at the finish line when they are barely out of the starting blocks.

The data backs this up. On the SWE-bench Pro Q2 2026 leaderboard, top models solve roughly 50-60% of real-world software engineering tasks. A 2025 analysis of 470 pull requests found AI-generated code carries 1.7x more issues than human-written code, with security vulnerabilities up to 2.7x higher. The gap between "code that compiles" and "code that's production-ready" is where your expertise earns its keep.

If you cannot read and evaluate what is being generated, you are creating a ticking time bomb. You must understand every line running in production.

You understand every piece of code that ships

Section titled “You understand every piece of code that ships”

You can have AI write code, but you must understand what it does. This is non-negotiable. If you cannot explain the logic, the dependencies, and the failure modes, the code is not ready to ship. AI is a collaborator, not a replacement for comprehension.

Research from Checkmarx shows that AI-generated code often receives less careful checking than human-written code, creating serious security risks. The speed at which AI produces code is the structural problem: it generates far faster than you can reason about it, and reviewing output is not the same as truly understanding it.

Never assume AI-generated code is secure. Authentication, authorization, input validation, secrets management, and cryptographic operations all require explicit human review by someone who understands the security implications. AI can generate security code, but the review must be human.

When reviewing AI output, check for: hallucinated dependencies, deprecated API usage (the AI may have been trained on older versions), hardcoded values that should be configurable, missing error handling and edge cases, and incorrect assumptions about the runtime environment.

Slopsquatting is a real and growing supply chain threat: research shows roughly 20% of LLM code samples recommend non-existent packages, and attackers register those phantom names with malicious payloads. Verify every dependency actually exists and does what you expect. The tools you use to write code are themselves an attack surface; multiple AI coding tools have shipped exploitable vulnerabilities, with disclosure aggregations now tracked in the resources below.

Different AI models have uncorrelated blind spots. Using a second model to review the first model's output catches substantially more issues than single-pass review (one industry study reports a 3-5x improvement, though this comes from a vendor blog and we have not seen an independent reproduction). This is not about replacing human review; it's about making the code that reaches human review already stronger. Have one model generate, another critique, and a human make the final call.

Be aware of the limitations of LLM-as-judge approaches: model judges are vulnerable to adversarial manipulation and struggle when evaluation requires external context. Adversarial review is a filter, not a guarantee.

Commit early and often. AI-assisted development makes rollback points critical because you need clean boundaries when AI-generated changes don't work out. Never skip source control, and never let AI run unchecked without periodic commits.

You review the AI's approach, not just its output

Section titled “You review the AI's approach, not just its output”

Ask the AI to explain its code: "Explain the purpose and step-by-step logic. Include alternative implementations considered and why this choice was made." This surfaces assumptions and trade-offs that are invisible in the code alone.

Your team has a governance policy for AI-generated code

Section titled “Your team has a governance policy for AI-generated code”

This is not bureaucracy; it's the same discipline you apply to any third-party code entering your codebase. As regulatory frameworks like the EU AI Act take full effect, organizations without governance policies will face compliance gaps alongside security ones.

A workable AI governance policy should cover at minimum:

  • Approved tools and models. Which IDEs, CLIs, agents, and providers are allowed; which require approval; which are prohibited.
  • Data classes. What data may be shared with which tools (public, internal, customer or client, regulated, secret).
  • Retention and training settings. Which providers may train on prompts; what retention is acceptable; what must be disabled per contract.
  • Client and regulated-work restrictions. Default rules for code, logs, and telemetry that originated from a client engagement (see Pillar 10).
  • Required human review categories. Which code paths require dedicated security review, dedicated privacy review, or named-owner sign-off before merge.
  • Logging and observability. What AI activity is logged, where, for how long, who can read it.
  • Incident response. Named owner, escalation path, rollback procedure when AI-generated code causes an incident.
  • Accountability. Who is on the hook when AI-generated code ships and breaks.

Cross-check against NIST SSDF for the secure-development hooks and EU AI Act obligations for any work touching the EU market.

  • Accepting code that compiles and "looks right" without tracing through the logic
  • Using unfamiliar frameworks based on AI recommendation without ability to verify or debug the output
  • Letting AI handle security-sensitive code paths without dedicated human review
  • Skipping version control because "I can always regenerate it"
  • Not specifying framework/library versions, leading to AI generating code against deprecated APIs
  • "Proof by AI": using "the AI suggested it" as justification in a code review
  • Installing AI-suggested dependencies without verifying they exist and are maintained (slopsquatting)
  • Relying on a single model to both generate and validate its own output
  • Having no organizational policy on which AI tools are approved and what review is required