Learning Paths

Curated public resources for going deeper on each pillar of AI-assisted development. Quality over quantity.

Curated public resources for going deeper on each pillar. Focused on quality over quantity. Each entry includes a brief annotation on why it is worth your time.

Last Updated: April 2026 - Resources are reviewed quarterly. If a link is dead or outdated, open a PR.

Staying Current

These are the resources our team actually uses to keep up with the AI development landscape.

Newsletters

TLDR AI - Daily digest covering AI business, tech/research, and notable GitHub repos. Curated and concise.
Alpha Signal - AI-focused newsletter with research highlights and tool announcements.

Podcasts

AI Daily Brief - Daily episodes covering AI news plus one deep-dive story. The most current and consistent AI podcast we have found. Also available as audio-only podcast.

YouTube

Theo / t3dotgg - Covers major AI and web development news almost daily. Opinionated but well-informed; use your own judgment.

By Pillar

LLM Foundations (Pillar 0)

The Illustrated Transformer - The canonical visual explanation of the transformer architecture. Used in courses at Stanford, Harvard, and MIT. Start here for understanding how self-attention lets the model weigh the relevance of every part of the input when producing each part of the output, why parallelization works during training, and why long-range dependencies hold up in ways prior architectures could not match.
HuggingFace LLM Course: Tokenizers (Chapter 6) - Technical reference covering BPE, WordPiece, and SentencePiece tokenization algorithms. Essential for understanding why code tokenizes less efficiently than prose, why unusual variable names consume more tokens than common ones, and how token boundaries affect cost and quality.
Understanding Tokens, Embeddings, Vectors, and Similarity - Bridge piece connecting tokens to embeddings to vector similarity. Good companion to the HuggingFace doc for building the full mental model.
The Illustrated Word2Vec - Visual guide to word embeddings and vector similarity from the same author as the Illustrated Transformer. Foundational for understanding how RAG retrieval works.
LlamaIndex: Understanding RAG - Walkthrough of the full RAG pipeline: loading, indexing, embedding, retrieval, generation. Each stage has knobs that affect output quality, and reasoning about which stage is causing a bad answer (wrong chunks, hallucinating despite good context, or chunked poorly at ingestion) is the practical skill this resource gives you.
Prompting Guide: LLM Settings - Vendor-agnostic explanation of temperature, top-p, top-k, and other inference parameters. Practical bands to start from: 0.0-0.2 for code generation and data extraction, 0.5-0.7 for general tasks, 0.7-1.0 for creative work. Don't combine temperature and top-p adjustments simultaneously; tune one at a time. Read alongside the provider-specific notes: Anthropic defaults temperature to 1.0, OpenAI defaults vary by model, and many reasoning models constrain temperature entirely. The same value produces different behavior across providers.
Reducing Sycophancy: 8 Practical Counters - The full checklist for countering RLHF-induced agreement bias. Use these together rather than picking one:
- Don't lead the witness. "Compare React and Vue performance, include cases where each wins" not "Isn't React faster than Vue?"
- Remove emotional framing. "Review this architecture for flaws. Be direct." not "I spent three weeks on this, what do you think?"
- Ask for disagreement explicitly. "Challenge my assumptions. If my premise is wrong, say so." Gives the model permission to override its agreement training.
- Ask for counterarguments. "Give me 3 reasons this approach might fail" or "Steel-man the opposing view." Forces the model into critical mode.
- Use persona prompting. "Act as a skeptical senior engineer reviewing a PR. Find problems." A critical persona overrides the default helpful-agreeable persona.
- Use two-pass review. First ask the model to generate, then: "Now critique your own answer. What might be wrong?" Splitting generation from evaluation produces more honest assessment.
- Ask for confidence levels. "Rate your confidence 1-10 for each claim. For anything below 7, explain your uncertainty." Forces the model to distinguish what it knows from what it's guessing.
- Never use confirmation questions. "This is correct, right?" and "Does this make sense?" are leading questions that trigger agreement. Use "Is this correct? If not, explain why." instead.
Anthropic: Building Effective Agents - Anthropic's guide to agent architecture patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer. The reference document for understanding how the tools you use daily are structured.

Context Engineering (Pillar 1)

Claude Code Documentation - Official docs covering CLAUDE.md, rules files, context management, and project configuration. Start here.
Context Engineering Guide - Theoretical framework for context engineering as a discipline. Covers what goes into the context window, how to structure it, and why placement matters.
12-Factor Agents: Own Your Context Window - Practical implementation of context ownership as an engineering discipline. Frames context management as a design decision, not an afterthought.
awesome-cursorrules - Community collection of rules files. Useful for seeing how other teams structure their AI context.

Planning Before Code (Pillar 2)

OpenSpec.dev - Spec-first development tool for Claude Code. Follows the specify, clarify, plan, task, implement workflow. Lighter on tokens, tuned for existing codebases.
SpecKit - Heavier spec-first tool for greenfield projects. Generates comprehensive context through deep codebase research. Token-expensive but catches implicit requirements.

Prompt Engineering (Pillar 3)

Anthropic Prompt Engineering Documentation - Comprehensive reference covering all prompting techniques from basic to advanced. The primary reference for our daily driver model.
OpenAI GPT-4.1 Prompting Guide - OpenAI's model-specific guide (April 2025). GPT-4.1 specific, but the sections on agentic workflows, long context (1M tokens), and instruction following transfer across models. Worth reading for cross-model perspective.
Prompting Techniques Reference - Well-organized index of all named prompting strategies with explanations: zero/few-shot, chain-of-thought, ReAct, Reflexion, and more. The field's shared vocabulary in one place.
Image Prompting Guide - Techniques for multi-modal image prompting. Relevant for UI work, visual debugging, and diagram interpretation.
Meta-Prompting Masterclass - Deep dive on using LLMs to generate, critique, and refine prompts. Covers the prompt generator, prompt critic, and prompt evolution patterns. Daily-use technique for anyone writing complex prompts.
Verbalized Sampling - Technique for nudging models away from default word choices by specifying probability. Useful for content generation and getting diverse variations from the same prompt.

The AI as Collaborator (Pillar 4)

Mozilla AI: Owning Code in the Age of AI - Essential reading on the tension between AI's code generation speed and human comprehension. Frames the code ownership problem clearly.
AmazingCTO: AI Code Ownership - Practical framework for code ownership policies when AI is writing significant portions of your codebase. Covers the "you generate it, you own it" principle and migration strategies.
Programmatic Tool Calling Patterns - Analysis of Sonnet 4.6's approach to tool execution. Useful for understanding how agentic AI reasons about when and how to use tools.
OpenAI: Harness Engineering - Interesting read on how engineering teams structure AI-assisted workflows at scale.

Guardrails and Quality (Pillar 5)

Google Cloud: Five Best Practices for Using AI Coding Assistants - Google's recommendations for integrating AI coding tools into team workflows. Practical and well-grounded.
Claude Code Hooks - Reference for AI-specific hooks that catch issues during agent loops, not just at human-review time. Pattern transfers to Cursor and other tools that expose comparable lifecycle hooks; the principle is "give the AI a tight feedback loop so it self-corrects in-session."
Semgrep Rules Guide - How to write the static-analysis rules that catch AI-introduced patterns linters miss (insecure defaults, deprecated APIs, framework misuse). Custom rulesets become a project-specific guardrail layer that scales as your AI-generated code volume grows.
12 Factor App - The classic software engineering best practices. Following these makes your codebase more AI-friendly by default (clean structure, explicit dependencies, config in environment).
pre-commit Framework - Multi-language hook framework for enforcing linting, formatting, secret scanning, and custom checks on every commit. The non-negotiable mechanical layer beneath any team-wide guardrail strategy.

Verification and Security (Pillar 6)

OWASP Top 10 for LLM Applications - Industry-standard framework for LLM security risks. Covers prompt injection, data leakage, and insecure output handling. Required reading for anyone shipping AI-integrated systems.
Anthropic: Measuring Agent Autonomy - Research on Claude Code's agent behavior in practice. Key takeaway: models should recognize their own uncertainty.
Latent Space: Are Code Reviews Dead? - Thought-provoking analysis of how AI changes code review. Not necessarily gospel, but a useful framework for thinking about where human review still matters most.

Workflow and Tooling (Pillar 7)

MCP: Introduction - The official MCP introduction. Covers the protocol's purpose and how it relates to function calling, with links into the spec when you're ready to go deeper.
Martin Fowler: Function Calling with LLMs - Thorough deep dive on function calling architecture, including security considerations and design patterns. Fowler's treatment is the most complete single resource on this topic.
MCP Protocol - The Model Context Protocol specification. Understanding MCP is essential for extending AI capabilities with external tools.
MCP Apps: Bringing UI to MCP Servers - MCP Apps let tools return interactive UI components (dashboards, forms, visualizations) that render directly in the conversation. This is where MCP stops being just a data pipe and becomes a full application layer. Understanding MCP Apps changes how you think about what your AI tools can do.
MCP Apps Specification - The technical spec and examples for building MCP servers that serve interactive UIs. Includes starter templates for React, Vue, Svelte, Preact, Solid, and vanilla JavaScript.

Continuous Evolution (Pillar 8)

Agent Skills (agentskills.io) - The open Agent Skills standard. Originally developed by Anthropic, now an open standard adopted by Claude Code, Cursor, Codex, GitHub Copilot, VS Code, Gemini CLI, and ~30 other agents. Includes the Quickstart, full Specification, and Client Showcase. Read this first to understand the format that's becoming the cross-tool default.
The Complete Guide to Building Skills for Claude - Anthropic's guide to creating reusable skills. Worth reading both for using existing skills and for building custom ones for your team.
Simon Willison: Agentic Engineering Patterns - Living document modeled after Gang of Four design patterns, but for working with coding agents. Pattern-shaped entries on red/green TDD with agents, templating, and "hoarding" useful prompts. Updates as the field evolves; worth following as a primary practitioner source.
Anthropic: How AI Assistance Impacts Coding Skills (2026) - Anthropic's research on how delegation patterns affect comprehension. The detailed methodology and findings; the headline figure is referenced in Pillar 11 but the full paper has the more nuanced takeaways for engineers thinking about how to structure their own practice.
METR: AI Impact on Developer Productivity (2025) - The perception gap study: experienced developers believed AI made them 20% faster while measured outcomes showed they were 19% slower. Read this when calibrating your own self-assessment of how AI is changing your workflow.
RTK (Reduce Token Konsumption) - CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies. Practical for managing costs on heavy Claude Code usage.
IPE Newsletter: The AI Stack War - Analysis of closed vs. open AI platforms. Useful context for understanding where the tooling ecosystem is heading.

Evaluation and Measurement (Pillar 9)

promptfoo - Open-source prompt evaluation framework. Define test cases, run them against multiple prompts or models, compare results. Our recommended eval tool for testing rules files, prompt variations, and model selection.
promptfoo Documentation - Getting started guide for setting up your first eval suite. Covers assertions, test cases, and CI integration.
Verbalized Sampling - Technique for generating diverse output variations by specifying probability parameters. Useful for comparing output quality across prompt structures.

Data Hygiene and IP (Pillar 10)

GitGuardian 2026 State of Secrets Sprawl - Definitive data on secret leak rates in AI-assisted development. AI-assisted commits leak secrets at roughly double the baseline rate. Also documents 24,000+ secrets exposed in MCP configuration files.
OWASP Top 10 for LLM Applications - Sensitive information disclosure ranks #2 on their list. The full framework covers both input-side and output-side data risks.
CrowdStrike: Data Leakage as AI's Plumbing Problem - Analysis of input-side data leakage risks in AI coding tools. Covers how developers inadvertently share source code with hardcoded API keys and proprietary algorithms through coding assistants.

Knowing When NOT to Use AI (Pillar 11)

Simon Willison: Hallucinations in Code Are the Least Dangerous Form of LLM Mistakes - Counterintuitive argument that obvious hallucinations are easy to catch. The real danger is code that runs correctly but implements the wrong logic. Essential framing for the "looks right, is wrong" failure mode.
MIT: Roadblocks to Autonomous Software Engineering (2025) - MIT's mapping of where AI specifically fails: large codebases (millions of lines), global architectural coherence while generating locally correct code, hallucinating patterns that violate internal conventions. The most concrete inventory of the limits.
Thoughtworks Radar: Complacency with AI-Generated Code - Technology Radar entry codifying automation bias, anchoring bias, and review fatigue as recognized patterns. Useful for naming the failure mode in team discussions.
InfoWorld: How to Keep AI Hallucinations Out of Your Code - A Microsoft senior engineer's categorization of common AI failure types: code that doesn't compile, overly convoluted code, self-contradicting functions, and made-up nonexistent functions. Pattern-recognition reference for code review.
Ten Simple Rules for AI-Assisted Coding in Science - Academic paper with practical rules that apply well beyond scientific computing. Covers problem preparation, context management, testing, and quality assurance with AI. Strong on verification discipline and knowing when AI is the wrong tool.

Cross-cutting Frameworks

Resources that span multiple pillars and provide broader context for AI-assisted development.

Google DORA 2025: State of AI-Assisted Software Development - Data-driven analysis of how AI tools affect team performance based on ~5,000 respondents. Key finding: AI amplifies what is already there. Strong teams get stronger, struggling teams get worse. Useful for calibrating expectations.
NIST AI Risk Management Framework - Federal framework for managing AI risks in software systems. Heavy reading, but essential context if you work in regulated industries or government contracts.
Thoughtworks Technology Radar - Quarterly assessment of emerging technologies including AI-assisted development tools and practices. Good signal-to-noise ratio on what is production-ready vs. what is still experimental.