How the Latest AI Innovations Would Have Changed Everything We Built

Every case study has a limitation: it documents what was possible with the tools that existed when the work was done. The products built in this series were built with models and tooling that have since been substantially upgraded. This post documents what we would have done differently with the AI innovations that have shipped since this work began, using specific examples from actual products we built.

This is not speculation about some future state of AI. These are features available now, in February 2026, through Anthropic’s platform and the broader MCP ecosystem. If you are starting an AI collaboration practice today, this post is your briefing on the most consequential differences and what they mean for structuring your practice from day one.

What Has Actually Changed

The innovations that matter most for the methodology described in this series fall into five categories: Agent Teams, Agent Skills, the Model Context Protocol, the 1-million-token context window, and extended thinking. Each addresses a specific limitation of the original methodology. Together, they represent a platform that is qualitatively different, not just quantitatively faster.

Agent Teams (experimental, Opus 4.6+) allow multiple Claude Code instances to work simultaneously — one as team lead, others as domain specialists — with direct peer-to-peer communication and shared task lists. The manual inter-session handoff coordination of the original methodology is replaced by built-in inter-agent coordination.

Agent Skills (available since October 2025) are reusable, version-controlled folders containing instructions, scripts, and resources that persist across sessions and compose together. They replace manually-loaded CLAUDE.md context files with platform-native, automatically-loaded shared knowledge.

The Model Context Protocol (MCP) is an open standard for connecting AI applications to live external systems — databases, APIs, version control, project management tools. Instead of the human manually exporting and pasting external data, Claude queries it directly. The AI’s knowledge of your systems is always current, not a snapshot from the last time you pasted something in.

The 1-million-token context window (beta for Sonnet 4 and Opus 4.6) pushes the original 200,000-token limit out five-fold. What could previously coexist in a session context — the active product plus a carefully pruned subset of the shared library — now expands to the entire project universe simultaneously.

Extended thinking lets Claude reason through complex problems before generating output, with the reasoning visible in reviewable “thinking blocks.” Complex architectural decisions no longer require elaborate prompt engineering to elicit good reasoning — they get it by default when extended thinking is enabled.

What We Would Have Built Differently: The Shared Library

The shared library — the `ITI/shared/` infrastructure described in Post 5 — was the most valuable infrastructure built in this case study and the most expensive to build because we built it retrospectively. We built the same authentication system five times, the same database abstraction three times, the same API client pattern four times before consolidating them into a shared library that all products could reference.

Starting today, the shared library would be built as a set of Agent Skills from day one. Each domain of shared knowledge — authentication patterns, database patterns, API integration patterns, WordPress plugin structure, desktop app architecture — would be a separate skill. The WordPress plugin skill would contain the canonical plugin file structure, the standard activation/deactivation hooks, the options page pattern, the database migration pattern, and the admin notice pattern. Every new product would load that skill automatically, ensuring consistency without the human manually checking that conventions were being followed.

The composability of Skills means the WordPress plugin skill could reference but not duplicate the authentication skill and the database skill. A new product requiring WordPress + database + authentication would load three skills that compose into the complete pattern library for that product type. What we built as a single large CLAUDE.md context document — fragile, manually maintained, easy to let drift stale — becomes a modular, version-controlled, composable system that the platform manages.

The practical difference: the four months we spent building and consolidating the shared library retrospectively would have been two weeks of initial skill definition, applied to the first product and refined with each subsequent one. The 50-70 percent reduction in code duplication we eventually achieved would have been present from product one, not discovered after building twenty products and counting the cost.

What We Would Have Built Differently: The Multi-Agent System

Post 5 describes the multi-agent system we assembled: an Orchestrator Agent, a Pattern Agent, an API Integration Agent, a QA Agent, and a Documentation Agent — each implemented as a separate Claude session with distinct system prompts and context files. The human coordinated handoffs between them: finishing work in the Orchestrator session, summarizing the output, opening the QA session, pasting the relevant context, and directing the quality review.

That manual coordination overhead was substantial. A build-test-review cycle that could have been completed in a single parallel operation required five sequential session switches, each with context-loading overhead. On a complex build with five coordination cycles, the overhead was two to three hours of session management per day — time spent transferring context between agents rather than building products.

With Agent Teams, the same multi-agent architecture runs natively. The Orchestrator is the team lead session. The Pattern Agent, QA Agent, and Documentation Agent are teammates working in parallel. The human defines the requirements, hands them to the Orchestrator/team lead, and receives consolidated output. Inter-agent coordination — currently the human’s job — is handled by the team lead through built-in messaging channels.

For the specific products in this case study, the impact would have been most visible on the RAG architecture products. The AI chatbot and knowledge base system required simultaneous work on the ingestion pipeline, the vector database integration, the retrieval logic, and the user interface — four domains that were built sequentially in the original methodology because each required a separate focused session. With Agent Teams, all four could build in parallel with the frontend agent consuming an interface specification that the backend agent is simultaneously implementing. Calendar time for RAG products would have been 40-60 percent lower. And the interface consistency problems that emerged from sequential builds — where the frontend was built against an interface spec that the backend subsequently changed — would have been caught in real-time through inter-agent communication rather than discovered during integration testing.

What We Would Have Built Differently: Context Management

Context management — deciding what to include in a session, what to leave out, how to maintain project continuity across sessions — was a constant discipline in the original methodology. Post 4 documents “context debt” as one of the most expensive hidden costs of the practice: the accumulated cost of sessions that operated with incomplete context producing subtly wrong decisions that compounded over time.

The 1-million-token context window does not eliminate the need for context management discipline. Even with 1 million tokens available, choices about what to include in a session remain relevant — you cannot load everything forever, and the most important context should still be explicitly surfaced rather than buried in a mass of background material. But it eliminates the worst manifestations of context debt: the architecture decision made without visibility into a relevant constraint in a different product, the naming convention inconsistency that developed because the naming guide didn’t fit in the session context alongside the active build, the security requirement that was abbreviated out of the context and quietly dropped.

With a 1-million-token window, our shared library — all of it, including every product’s architecture documentation, every established pattern, every integration specification — would have been present in every session simultaneously. The cross-product consistency checking that currently requires a dedicated session to perform would have been continuous: the AI seeing the full picture every time it made a decision, flagging inconsistencies as they emerged rather than letting them accumulate.

The extended prompt caching feature — maintaining context across session breaks for up to 60 minutes — addresses the session restart overhead that was a daily friction in the original methodology. The shared library context loaded at the start of a morning session would still be live after a break for lunch, without requiring re-establishment. For a practice that runs multiple sessions per day, the accumulated time saved from reduced session startup overhead is material.

What We Would Have Built Differently: External System Integration

Every product in the portfolio touched external systems: WordPress databases, REST APIs, payment processors, email services, analytics platforms, vector databases. In the original methodology, the AI’s knowledge of these systems was limited to whatever documentation the human manually included in the session context — typically the relevant API documentation sections, manually copied and pasted, or summarized by the human from memory.

MCP changes this fundamentally. An MCP database connection lets the AI query the live database schema before writing any database-touching code. Instead of the human describing the schema from memory — with the inevitable omissions and slightly wrong field names — the AI sees the actual current schema and writes code that matches it exactly. The class of errors that came from AI-generated code referencing field names that didn’t exist, or making assumptions about data types that were wrong, would have been structurally prevented.

For the WordPress plugins specifically, an MCP connection to the local WordPress installation would have given the AI access to the live list of active plugins, the current database tables, the registered hooks, and the active theme’s template structure. Rather than describing the WordPress environment from documentation and the human’s knowledge, the AI could have verified its assumptions against the live environment before writing code that would run in it. The integration testing cycles that consumed significant time in every plugin build would have been shorter, because more of the integration assumptions would have been validated before the build rather than corrected after it.

For the AI chatbot and knowledge base products, MCP connections to the vector database would have allowed the AI to query the actual state of the knowledge base — what embeddings exist, what the retrieval quality looks like on test queries, what gaps remain — as part of the development process, rather than requiring the human to run test queries manually and report results back to the AI session.

What We Would Have Built Differently: Requirements and Quality

Post 4’s analysis of the requirements gap — the AI building exactly what was specified rather than what was needed — documented one of the most consistent sources of expensive rework. The diagnosis: requirements documents abbreviated edge cases and integration assumptions because writing them comprehensively took more time than seemed justified for what looked like a straightforward feature.

Extended thinking addresses this from the AI side. Rather than accepting requirements as written and building from them directly, extended thinking lets Claude reason through the requirements before generating an architecture proposal: identifying unstated assumptions, flagging ambiguous specifications, questioning edge case handling that the requirements leave undefined. The architectural review — already the most valuable part of a build session — becomes a more thorough gap analysis before any code is written.

In practice, on a product like the WordPress SEO analysis plugin, extended thinking would have surfaced the assumption gap around multisite WordPress installations before the entire plugin was built assuming single-site installations. The requirement said “analyze WordPress pages.” Extended thinking would have prompted: “Does this include multisite sub-sites? The implementation differs significantly.” That question answered before the build saves the rework cycle that happened after it.

The Files API — allowing requirements documents to persist across sessions as AI-accessible artifacts — addresses scope drift. With requirements stored and explicitly referenced throughout a build, the AI flags when proposed changes violate documented requirements before implementing them. Scope creep becomes visible before it is built in rather than after.

The Architecture for Today’s Practice

Based on the analysis above, here is how the methodology described in this series would be structured starting from scratch today:

Foundation: Agent Skills as the shared library. Before the first product is built, invest one to two weeks defining core skills — shared code patterns, testing standards, documentation conventions. Version-controlled, auto-loaded, and composable, they replace the manual CLAUDE.md context file system entirely.

Integration: MCP connections to live systems. Set up MCP connections to primary external systems — the database, version control, primary APIs — before building products that touch them. The AI’s knowledge of those systems becomes current and queryable, not dependent on manually maintained documentation.

Execution: Agent Teams for complex builds. Use Agent Teams when a product involves multiple simultaneous concerns — frontend, backend, integration, testing. The team lead receives requirements and coordinates; specialists execute in parallel; the human reviews consolidated outputs. Single-concern builds use a single session with relevant skills loaded.

Reasoning: Extended thinking for architectural decisions. Enable extended thinking for architecture reviews, requirements analysis, and technical debt assessment — decisions where the reasoning matters as much as the conclusion. Use the thinking blocks as review artifacts: correct assumptions you disagree with before the build starts, not after.

Context: 1-million-token window for cross-project work. Maintain the full shared library in context for sessions that make cross-product decisions. Single-product builds can use product-specific context. Portfolio-level architectural decisions should use the full context the 1-million-token window enables.

What Does Not Change

The innovations described in this post are substantial, but they do not change what is fundamentally about human judgment.

Domain expertise in the directing role remains essential. More capable AI produces more plausible outputs — and plausible-but-wrong is more dangerous than obviously-wrong because it is harder to detect. Extended thinking makes AI reasoning more visible; it does not replace the human’s need to evaluate whether that reasoning is sound. The professional with deep domain expertise evaluates outputs better and gets dramatically better results. This advantage compounds with AI capability improvements rather than being replaced by them.

Systematic practice over ad hoc use remains the difference between linear and compounding improvement. Agent Skills, Agent Teams, and MCP connections are infrastructure investments. Their full value comes through consistent practice that builds on them, not through occasional use. The compounding dynamics described throughout this series apply equally to current innovations: build systematic context now, and the infrastructure pays back across every session that follows.

Rigorous evaluation at every step remains non-negotiable. The failures documented in this series were almost always detectable at the evaluation step — if testing was rigorous and integration assumptions were verified rather than assumed correct. No AI innovation evaluates output for you. They improve what gets built; the human still determines whether what was built is right.

The Practical Starting Point

If you are beginning an AI collaboration practice today, or extending an existing one to take advantage of current innovations, the order of operations matters. Start with Agent Skills — not because they are the most exciting innovation, but because they are the foundation everything else builds on. A well-designed skill library makes every subsequent session more consistent, more efficient, and more aligned with your established conventions. Two weeks of skill definition saves months of context management overhead.

Add MCP connections to the external systems your work touches most frequently. For most professionals, this means a database connection, a version control connection, and connections to one or two primary workflow tools. The setup cost is a few hours per connection with current tooling. The return — AI that sees your live systems rather than your description of them — pays back immediately in reduced integration debugging time.

Experiment with Agent Teams on a build that genuinely benefits from parallel specialization. Not every build does. A simple feature addition to an existing product does not need an agent team. A new product with distinct frontend, backend, and integration concerns does. Start there, where the parallel structure of the work matches the parallel execution capability of the teams feature.

Enable extended thinking for architectural decisions from the start. The cost is higher per token, but architectural decisions made poorly are paid for many times over in rework. Budget extended thinking for the decisions that matter most — initial architecture design, major refactoring choices, integration approaches for complex external systems — and use standard models for production tasks where the reasoning is straightforward.

The methodology described in this series worked with tooling that was substantially more limited than what exists today. The same methodology, applied with current innovations available from the start, would have produced more products, faster, with fewer of the expensive failures that made this case study instructive. Build on what is documented here — and build it with the tools that exist now.

Leave a Comment