Current Toolchain

Our opinionated, point-in-time stack for AI-assisted development. The pillars are tool-agnostic; this file is not.

Last Updated: April 2026

This is our opinionated, point-in-time stack. The pillars are tool-agnostic and durable. This file is not. Expect it to change as the ecosystem evolves.

AI Coding Assistants

Daily Driver: Claude Code

Claude Code is our primary AI coding tool. It runs in the terminal, understands your codebase through agentic exploration, and supports the full development lifecycle from planning through implementation. Key reasons we chose it:

Agentic mode with tool use (file editing, shell commands, web search) in a single session
CLAUDE.md and rules files for persistent project context and coding standards
Hooks system for enforcing guardrails automatically (linting, formatting, test runs)
Session persistence with /resume and /rename for picking up where you left off
Remote control for mobile monitoring of long-running tasks
Skills and plugins ecosystem for extending capabilities
Scheduled tasks for recurring automation

We use Opus 4.7 for complex planning, architecture decisions, and code review. Sonnet 4.6 handles routine implementation work at lower cost with strong quality. Our rough split: Opus for ~30% of work (planning, debugging, architecture), Sonnet for ~70% (implementation, tests, boilerplate). This maps to the 70/30 model strategy described in Pillar 7.

Secondary: Codex

OpenAI's Codex serves as our secondary when we need a different model perspective or when a task benefits from OpenAI's ecosystem. Useful for cross-checking architectural decisions against a second model's reasoning and for tasks where model diversity catches blind spots. See Pillar 6: Verification and Security for the adversarial-review pattern this enables.

Spec and Planning Tools

Primary: OpenSpec.dev

OpenSpec is our primary spec-first development tool. It follows a structured workflow: specify, clarify, plan, task, implement. It catches implicit requirements that raw prompting misses by researching the codebase before generating code. Lighter on tokens than alternatives, making it practical for daily use on Pro plans across both brownfield and greenfield work.

The core discipline OpenSpec enforces (and that we expect even without the tool): research the codebase first, surface assumptions through clarifying questions, create a plan with discrete tasks, then implement. See Pillar 2: Planning Before Code.

Secondary: SpecKit

SpecKit's heavier research phase produces meaningfully better output when thoroughness matters more than cost. It generates massive context files and catches requirements hiding in code comments. The token cost is real, so we reach for SpecKit when a project demands deep codebase analysis upfront.

Dev Infrastructure

CLAUDE.md / Rules Files

Every project has a CLAUDE.md (or equivalent rules file) checked into source control. This is non-negotiable. It contains project context, coding standards, architecture patterns, and explicit "do not" instructions. See Pillar 1: Context Engineering.

MCP (Model Context Protocol)

We use MCP servers to give our AI tools direct access to external systems: database schemas, API documentation, browser automation, and project management tools. The pattern of giving AI tools to work with (rather than describing things manually) consistently produces better results.

Common MCP servers in our stack:

Playwright for browser automation and UI verification
Database connectors for schema introspection

Hooks

Claude Code hooks enforce guardrails at the tool level: pre-commit linting, automatic test runs after code changes, formatting enforcement. These are configured per-project and checked into source control so every developer gets the same constraints. See Pillar 5: Guardrails and Quality.

Skills

Reusable skill packages that extend AI capabilities for specific tasks (frontend design, document generation, etc.). We install relevant skills globally and per-project as needed. agentskills.io is the canonical home of the open Agent Skills standard (originally from Anthropic, now adopted by Claude Code, Cursor, Codex, Copilot, and others); the Anthropic Skills docs cover Claude Code specifics; skills.sh is a community-run directory of available skills.

Version Control

Git is non-negotiable. Commit early, commit often. AI-assisted development makes aggressive version control even more critical because you need clean rollback points when AI-generated changes don't work out.

Evals

promptfoo

Our recommended tool for prompt evaluation. promptfoo is open-source and lets you define test cases, run them against multiple prompts or models, and compare results systematically. Use it to evaluate rules files, test prompt variations, benchmark model selection for specific tasks, and catch regressions when you change your AI configuration. Prompts are software; promptfoo lets you test them like software. See Pillar 9: Evaluation and Measurement.

Voice as Input

SuperWhisper

Our recommended voice input tool. SuperWhisper runs on macOS, Windows, and iOS, and provides high-quality speech-to-text that feeds directly into any text field, including your terminal and editor. It keeps you in flow state by letting you dictate prompts, requirements, and architectural thinking faster than typing. Particularly effective for ideation, specification drafting, and talking through a problem before committing to code.

Voice dictation is an underused input method for AI-assisted development. The AI handles the messiness of spoken language well, and you can always refine the prompt after dictation. Some developers use voice for the initial brain dump and then edit for precision before sending.

Voice is not a replacement for written specs or structured prompts. It is an accelerant for getting ideas out of your head and into the AI's context quickly.

Cost Awareness

AI model pricing spans a wide range. As of April 2026, input token costs range roughly from $0.25 to $15 per million tokens across major providers, with output tokens running $1.25 to $75 per million. That is a 60x spread. Understanding where your usage falls on that spectrum is a professional responsibility, not an optional concern.

Practical cost levers to be aware of: model tier selection (the 70/30 strategy above), context window size (larger contexts cost more per request), prompt length (concise prompts save tokens without sacrificing quality), and caching (reusing context across related requests). See Pillar 7: Workflow and Tooling for principles on model selection and cost optimization.