Pillar 3: Prompt Engineering

Prompting is a core engineering skill that requires practice and intentionality.

Prompting is a core engineering skill that requires practice and intentionality.

The difference between a vague request and a well-structured prompt is often the difference between usable output and wasted time. Prompt engineering is not about memorizing magic phrases. It is about communicating clearly, providing the right context, and understanding how to guide an AI toward the output you need.

Words matter. "Can you take a look at this and see why it's not working" produces a very different result than "I tested [files] and expected [outcome] but [result] occurred instead. Analyze [files] and identify the root cause. Share your thinking with me."

What We Expect

You write clear, specific prompts with explicit requirements

Be unambiguous about what you want. Include: what you're building, what constraints apply, what patterns to follow, what success looks like. Provide examples of the code style you want when it matters.

You adapt your prompting style to the model family you're using

Different model families respond differently to the same prompting techniques due to differences in training. Anthropic models respond well to XML tags as structural delimiters and tend to follow instructions as strict rules. OpenAI models accept both Markdown and XML structuring; the load-bearing distinction is reasoning vs. GPT models, where reasoning models prefer high-level guidance and GPT models prefer precise step-by-step instructions. Newer GPT models tend toward verbosity if not explicitly bounded.

Read the prompting guides for the models you use most: the Anthropic prompting guide and OpenAI prompting guide both document model-specific behaviors that affect real-world results. A prompt that works perfectly on one model family may need restructuring for another.

You ask the AI to restate before it executes

Before letting the AI write code, ask it to surface its understanding in a form you can audit. Effective patterns: "Restate the task, the assumptions you are making, the files you intend to read, the risks you see, and how you will verify your work." "List the questions you would need answered to be confident in this implementation, ranked by how much they would change your approach." "Describe what you would do first, and what you would do only after seeing the result of the first step."

Restate-then-act prompts surface misunderstandings; "do you understand?" invites the model to say yes whether it does or not. Per Pillar 0, yes-or-no questions are a sycophancy attractor. Avoid them as your verification step.

You understand reasoning models and when to allocate deeper thinking

Some models and modes dedicate more computation to step-by-step reasoning before producing output. Claude's extended thinking, OpenAI's reasoning models (the o-series), and similar features across other tools all offer ways to control this reasoning budget. Reasoning tokens are billed as output tokens and they occupy context window space, so the cost is real.

Use deeper thinking for debugging, architecture decisions, and complex logic. Use lighter modes for routine implementation. OpenAI frames reasoning effort as "a tuning knob, not the primary way to recover quality": if a prompt is failing, fix the prompt before throwing more reasoning at it. Knowing when to pay the extra latency and token cost is a practical skill that directly affects output quality.

You use meta-prompting to generate and refine prompts

Meta-prompting is the practice of using the LLM itself to create, critique, and improve prompts. Instead of manually crafting a complex prompt from scratch, ask the model to generate prompt variations, evaluate their quality, and refine based on criteria you define.

Common applications: ask the model for several prompt options, ask it to critique its own draft prompts, and refine through multiple generations until the output stabilizes. These are daily tools, not occasional tricks.

You know the established prompting techniques by name and application

The field has a shared vocabulary backed by published research. You should know and apply: zero-shot vs. few-shot prompting (when to include examples and how many), chain-of-thought (eliciting step-by-step reasoning that often improves performance on multi-step problems; treat the trace as a generation aid, not a faithful explanation of internal computation), ReAct (combining reasoning with tool use, the foundation of agent loops), Reflexion (self-evaluation and correction loops), and image prompting (leveraging multi-modal input for UI work, visual debugging, and diagram interpretation).

You don't need to cite the papers, but when someone says "use few-shot with chain-of-thought," you should know what that means and why it works. See Pillar 0: LLM Foundations for the conceptual grounding behind why these techniques work.

You iterate on prompts rather than iterating through conversation

When AI output misses the mark, the instinct is to keep refining in conversation. The better approach for repeated tasks is to go back and improve the original prompt or documentation. This builds reusable context instead of one-off fixes.

You use structural emphasis, not aggressive language, to surface critical instructions

Modern frontier models are responsive enough to system prompts that aggressive caps-lock markers (IMPORTANT, CRITICAL, MUST) can cause overtriggering. Anthropic's current Opus prompting guide explicitly recommends dialing back this language and using normal prompting instead. Recent research on emphasis markers reaches a similar conclusion: a couple of marker tokens carry too little semantic weight to reliably shift attention.

What still works: markdown formatting (bold, headers) for structural emphasis, position-based placement (start and end of context per Pillar 1's "Lost in the Middle" rule), and explaining the why behind an instruction rather than shouting it. Treat caps-lock attention markers as a 2023-era tool that has aged out.

Anti-patterns

Vague prompts that leave the AI guessing about requirements, patterns, or scope
Never asking the AI to explain its understanding before it starts working
Asking the AI "do you understand?" or any yes/no verification question that invites agreement rather than restatement
Using the same generic prompting style for debugging, code generation, architecture review, and planning
Not including screenshots or images when describing UI work (multi-modal input exists, use it)
Spending time iterating in conversation when the real fix is improving your rules file or spec
Leaning on caps-lock markers (IMPORTANT, CRITICAL, MUST) on modern frontier models, where they cause overtriggering rather than improved instruction following

Resources

Prompting Guide: Techniques - Comprehensive index of named prompting strategies with research citations
Anthropic Prompt Engineering Overview - Anthropic's official prompting guide with iteration strategies
Understanding Reasoning LLMs - Deep explainer on how reasoning models work and when they help
Meta-Prompting: Task-Agnostic Scaffolding - Research paper on using LLMs to generate and refine prompts
OpenAI Prompt Engineering Guide - OpenAI's official prompting guide with model-specific techniques
Verbalized Sampling - Technique for increasing output diversity
See Learning Paths for deeper dives

Pillar 3: Prompt Engineering

What We Expect

You write clear, specific prompts with explicit requirements

You adapt your prompting style to the model family you're using

You ask the AI to restate before it executes

You understand reasoning models and when to allocate deeper thinking

You use meta-prompting to generate and refine prompts

You know the established prompting techniques by name and application

You iterate on prompts rather than iterating through conversation

You use structural emphasis, not aggressive language, to surface critical instructions

Anti-patterns

Resources

Pillars

Toolchain

Resources