Skip to content

Pillar 2: Planning Before Code

Never let AI write code without a plan and a specification.

AI coding assistants are much better at implementation than at understanding business requirements. The quality gap between “I need a user auth system” and a well-specified feature document fed into the same AI is enormous. Front-loading the planning work produces compound returns: better first-pass code, fewer iterations, and artifacts that persist beyond a single session.

Going to AI too quickly is the single biggest waste of time in AI-assisted development. If you don’t have a clear idea of what you want, you can’t put guardrails on it, you can’t evaluate the output, and you’ll burn cycles on rework that costs more than the planning would have.

You practice spec-first development. Specifications are the primary artifact, not code. Documentation, requirements, architectural decisions, and feature specs are not overhead; they are the primary input that enables AI to produce quality output.

For simple tasks, this can be a clear prompt with explicit requirements. For complex features, this means a written spec that covers: user-facing behavior (what it does from the user’s perspective), technical approach (how it should be implemented, including architectural choices with context, alternatives, and trade-offs), constraints (performance, security, compatibility), acceptance criteria (what success looks like, measurably), and out-of-scope (what this does NOT do).

The spec becomes persistent context that survives session boundaries. When using a framework like OpenSpec or SpecKit, architectural decisions are captured as part of the process. When working manually, use Architecture Decision Records (ADRs) to capture significant technical choices with their reasoning, so the next developer or AI session doesn’t relitigate decisions without understanding the trade-offs.

The key insight: specifications precise enough for an AI to implement are precise enough for a human to review and validate.

You use plan mode before execution mode. Most AI coding tools offer a plan or architect mode that separates research and reasoning from code changes. Use it. For any non-trivial task, the workflow is: “Here’s what I’m trying to accomplish. Create a plan before doing anything.” Review the plan. Give feedback. Only then does execution begin.

For complex work, save the plan as a markdown file so you can reference it later, copy it into new sessions, or adjust without re-explaining everything. The plan/act separation prevents the AI from racing ahead with implementation before you’ve agreed on the approach.

You use the AI as a planning partner, not just an executor. The AI is valuable during planning, not just after it. Before committing to an approach, use it to surface problems: “What are three ways this could fail?”, “What edge cases am I missing?”, “Are there simpler alternatives to this architecture?” Ask it to research constraints - check if the library you plan to use supports your requirements, verify that the API behaves the way you assume, or explore how the existing codebase handles similar patterns. Planning with the AI catches bad assumptions before they become bad code. The key discipline: the AI helps you think, but you make the decisions.

You break work into AI-sized tasks. A feature that would take a human developer a day might need to be split into 5-6 scoped sessions for an AI to execute well. Each task should fit within a single session’s effective context and have a clear, verifiable outcome.

Know where to draw boundaries: by module, by layer (data model, then API, then UI), or by concern (happy path first, then error handling, then edge cases). Large monolithic prompts produce large monolithic output that’s hard to verify. Small, scoped tasks produce small, verifiable results that compose into reliable systems.

You calibrate planning depth to task size. Not every task needs a spec. A one-line bug fix needs a clear description and a test case, not an ADR. A new feature touching multiple services needs a proper spec with acceptance criteria. Over-planning small tasks wastes more time than it saves - the goal is to front-load thinking proportional to risk and complexity. A useful heuristic: if the task takes less than 15 minutes to implement, a clear prompt is sufficient. If it touches multiple files or systems, write it down. If it’s irreversible or affects production, write a spec and get review.

You plan around the existing codebase, not in a vacuum. Most real work is modifying existing systems, not building from scratch. Your plan must account for what is already there. Before committing to an approach, have the AI read the relevant code: “Read the current auth middleware and tell me how requests are authenticated today.” Plans that ignore the current implementation produce code that conflicts with existing patterns, duplicates existing utilities, or breaks assumptions other parts of the system depend on. The best specs include a “Current State” section that describes how things work now before proposing changes.

You know what to do when the plan is wrong. Plans fail. You discover mid-implementation that the database doesn’t support the query you assumed, the library has a dealbreaker limitation, or the existing code is structured differently than expected. The discipline is: stop, don’t hack around it. Go back to the plan, update it with what you learned, and restart implementation from the updated plan. In practice: commit what works so far, update your plan markdown file with the new constraints, and start a fresh session with the corrected plan. The anti-pattern is letting the AI improvise its way around a broken assumption - that produces code that technically works but is architecturally wrong.

You follow the specify, clarify, plan, implement sequence. Whether using a spec-first framework or working manually, the discipline is the same: define what you want, surface ambiguities through clarifying questions, create a plan with discrete tasks, then implement. Skipping the clarify step is where implicit requirements get dropped. Some frameworks automate this sequence (specify, plan, tasks), but the cognitive discipline matters more than the specific tool.

You define acceptance criteria before generating code. Before the AI writes a line, you should know what “done” looks like. Acceptance criteria are disproportionately valuable in AI-assisted work because they give both you and the AI a concrete evaluation target.

If you can’t write the acceptance criteria, you can’t evaluate the output. If you can’t evaluate the output, you shouldn’t be generating it. Criteria can be as simple as “passes these 3 test cases” or as structured as a full test plan, but they must exist before implementation begins. This is the bridge between planning and Pillar 9: Evaluation and Measurement.

You know the prompting vs. RAG vs. fine-tuning decision framework. Before building any LLM-powered feature, you choose the right approach. There is near-universal consensus on the hierarchy: start with prompt engineering (hours to implement, near-zero cost), escalate to RAG when you need current or proprietary data, and fine-tune only when persistent behavioral changes are required (weeks to implement, significant cost).

The key diagnostic question: “Do we need new facts, or new behavior?” New facts point to RAG; if your team is building an internal tool that answers questions about company policies or product docs that change regularly, you need retrieval because the model simply does not have those facts. New behavior (tone, style, complex classification patterns) points to fine-tuning; if you need every response across thousands of requests to match a specific house style and format, and prompt instructions alone are not producing the consistency you need, that is a fine-tuning case. A critical misconception to avoid: fine-tuning does not reliably inject new knowledge; it changes behavior and style, not factual recall. Growing context windows (now 1M+ tokens in some models) are shifting some RAG use cases back to prompt engineering, since you can fit entire document sets in-context. If a customer support bot’s full response guidelines and FAQ content fit in a single prompt, start there before building a retrieval pipeline.

  • Jumping straight to “build me X” without specifying what X looks like
  • Letting the AI choose technologies, libraries, or patterns without your input
  • Not saving plans for complex work, then losing context when the session ends or the window fills
  • Changing feature scope mid-session instead of starting fresh with an updated spec
  • Treating product management as something that happens after the code is written
  • Generating code without knowing what “done” looks like (no acceptance criteria)
  • Giving the AI a full feature as one task instead of decomposing into verifiable steps
  • Assuming fine-tuning will fix factual accuracy problems (it won’t; that’s RAG’s job)
  • Reaching for RAG when the data would fit in a single prompt with a large context window
  • Making architectural decisions in chat and never recording the reasoning
  • Writing a full spec for a 5-minute bug fix (planning depth should match task complexity)
  • Planning in a vacuum without reading the existing code first
  • Letting the AI improvise around a broken assumption instead of stopping and re-planning
  • Never using the AI during planning (it’s a thinking partner, not just an executor)