Skip to content

Pillar 7: Workflow and Tooling

Be the orchestrator, not the bottleneck. Leverage agents, sessions, and tools.

The math of productivity changed when AI agents became capable of autonomous work. Every moment you spend on delegatable work blocks not just you, but all the parallel processes you could have spawned. Think like a CPU scheduler: your attention is the scarcest resource in the system. Before touching any task, ask yourself whether it could run in parallel while you work on something else.

That said, autonomy is a dial, not a switch. Start with tight human-in-the-loop steering (max 3 turns between check-ins) until you trust the prompt and verification criteria, then open it up for longer autonomous runs.

You understand session management and use it deliberately. Start new sessions for new tasks. Most AI coding tools support resuming previous sessions and labeling them for retrieval (e.g., Claude Code’s /resume and /rename). Your session strategy directly affects context quality: a clean session for a focused task produces better results than a sprawling conversation covering multiple concerns.

You give your AI tools, not just instructions. Why describe your database schema when the AI could query it directly? Why explain API contracts when it could read the OpenAPI spec through MCP? The AI with tools beats the AI without tools every time. Invest in configuring MCP servers, browser automation, and test runners that the AI can use in its agent loop.

You understand the model landscape well enough to make informed choices. The gap between top models has compressed significantly; the difference between the top model and the 10th-ranked is now roughly 5%. Different models lead different capability domains: some lead coding benchmarks (SWE-bench), others lead multimodal (MMMU-Pro), others offer the largest context windows. Open-source models provide 10-100x cost savings for simpler tasks. You should understand what benchmarks measure and their limitations: MMLU and HumanEval are saturated for frontier models and no longer differentiate them. More meaningful signals come from SWE-bench (real code on actual GitHub issues), LM Arena Vision (human preference for multimodal), and independent reproductions like Artificial Analysis and Vals.ai. Task-specific evaluation on your representative workloads matters more than any leaderboard score.

You know when to use different models for different tasks. Model selection is task-dependent, not one-size-fits-all. A practical starting point: reserve your most capable model for roughly 30% of your work (complex reasoning, architectural planning, tricky debugging) and use a faster, cheaper model for the remaining 70% (routine implementation, boilerplate, test writing, documentation). This split yields better cost-to-value than running everything on the most expensive model.

Model routing (directing simple queries to cheap/fast models and complex queries to expensive/capable ones) is becoming a production standard. Research from UC Berkeley’s RouteLLM (ICLR 2025) demonstrated 85%+ cost reduction while maintaining 95% of top-model quality.

You treat cost awareness as a professional skill. All major providers charge separately for input and output tokens, with output tokens costing 3-5x more than input. The cost difference between model tiers within a single provider can be 10-15x or more. Check your provider’s current pricing page; these numbers shift regularly as competition drives costs down.

Three cost levers matter most: prompt caching (saves 60-90% on repeated prefixes), batch APIs (significant discounts for latency-insensitive workloads), and model routing (directing simple tasks to cheaper models). Developers who ignore cost optimization either burn through budgets that get their AI access revoked or avoid using AI where it would help because they assume everything is expensive. Track your usage so you can make data-informed decisions about where premium models earn their price and where lighter models do the job.

You leverage structured workflows for complex tasks. For work that exceeds a single session’s capacity, use spec files and plan documents as state persistence. Your progress lives in markdown files with completion status. Combined with session resume, this gives you workflow resilience: if you are interrupted or the session ends, you pick up exactly where you left off. See Pillar 2: Planning Before Code for how to structure these artifacts.

You use version control as your safety net for autonomous runs. Before letting an AI run autonomously, ensure you have a clean commit point. Check the AI’s work at intervals. The longer the leash, the more important the rollback strategy. See Pillar 6: Verification and Security for the full verification framework.

  • Doing work manually that agents can do in parallel
  • Using the same session for hours across multiple unrelated tasks
  • Not configuring tools (MCP, hooks, test runners) that would make the AI self-sufficient
  • Using the cheapest model for complex reasoning tasks where a more capable model would save time on rework
  • Running everything on the most expensive model without considering whether a lighter model would produce equivalent results
  • Not tracking token usage or understanding the cost implications of large context windows
  • Running long autonomous sessions without commit checkpoints or verification criteria
  • Not learning the slash commands and capabilities of your tools; new features ship regularly