Building the First Product — What We Got Right and What We Got Wrong
The first substantial product built using AI collaboration was the AI News Cafe plugin — a WordPress application that aggregated content from multiple news sources, processed it through Claude to generate editorial summaries at multiple lengths, classified content by topic, and provided an editorial workflow for content teams. It was the right kind of first project: real, complex enough to stress-test the methodology, and consequential enough to learn from.
It was also too ambitious for a first project. That was the first lesson, and it is worth stating clearly before anything else: scope discipline in a vibe coding practice is entirely the human’s responsibility, and it has to be imposed from the beginning, before any build sessions begin. The AI will build whatever you describe. It has no mechanism for pushing back on scope. If you do not govern scope, no one does.
The Over-Scope Problem
The initial requirements document for the AI News Cafe included: real-time content ingestion from multiple sources, AI-generated summaries at three different lengths, topic classification, editorial workflow with multiple review states, an admin panel, audience segmentation, and a notification system. In a traditional development project, a project manager or senior developer would have recommended phasing this into at least three releases.
The AI built what was described. In approximately three weeks, there was a product that substantially did everything in the requirements document. The aggregation worked. The summaries worked. The admin panel worked. The workflow was partially implemented. The notification system was stubbed. And the product had subtle bugs in multiple places because the surface area was too large to test systematically — we found bugs in features we hadn’t realized we were testing when we went looking for bugs in other features.
The fix took two weeks of debugging for a product that could have been delivered in two weeks as a clean, stable v1 with a narrower scope. The lesson: write a v1 list and a later list before any requirements are written. Not after. Before. The v1 list contains what must work before the product is useful at all — the minimum viable set. The later list contains everything else. The AI builds the v1 list first. Testing and stabilization happen before the later list is touched. This discipline, applied from the first product, would have avoided one of the most expensive mistakes in the portfolio.
The Architecture Conversation
The pattern that emerged from the first project and held consistently across all twenty products: before any code is written, have an explicit architecture conversation. Ask the AI to describe what it is about to build — what files it will create, how data will flow between them, what the database structure will look like, what the major architectural decisions are. Review that description. Ask questions. Redirect before building begins if anything looks structurally wrong.
This conversation does several things simultaneously. It surfaces assumptions — both yours and the AI’s — before they are embedded in code. It gives you a window to apply domain knowledge the AI lacks: “this plugin will be installed on shared hosting servers with limited memory, so we cannot load all articles into memory at once.” It establishes a shared mental model of the system that makes all subsequent debugging faster — when something breaks, you both know what the intended structure was and can reason about where the deviation occurred.
In the AI News Cafe build, the architecture conversation revealed that WordPress’s caching layer would interfere with real-time content display unless the data retrieval was structured carefully. Catching that in a ten-minute conversation rather than a multi-hour debugging session three days later saved time proportional to the complexity of the cache interaction. Architecture conversations consistently return five to ten times their duration in avoided debugging time.
What Makes a Good Requirements Document
The requirements document is the primary quality lever in any vibe coding session. Over twelve months and twenty products, requirements documents evolved substantially — and the quality of first-pass AI output evolved in direct proportion. The components that proved most important:
Purpose statement: One paragraph describing what the product does and who uses it, written from the user’s perspective rather than the feature list perspective. “This plugin helps a content editor working alone to manage a high-volume AI news curation workflow without needing to read every article” tells the AI something different — and more useful — than “this plugin displays AI-generated article summaries.”
User scenarios: Two to five concrete descriptions of a real person doing a specific thing with the product. “A content editor opens the plugin on Monday morning, searches for articles tagged ‘healthcare AI’ published in the last 48 hours, reviews the AI summaries, selects four articles for the editorial queue, and marks the others as reviewed.” This is more useful than a feature list because the AI can trace the entire technical chain required to support the scenario, surfacing requirements that weren’t stated explicitly.
Technical constraints: What platform, what performance requirements, what it must not do, what external services it connects to. These prevent technically correct but contextually wrong implementations — building a real-time system for a platform that doesn’t support real-time well, for example.
The v1 and later split: Explicit and non-negotiable. The v1 list is what gets built. The later list gets acknowledged and set aside.
The Iteration Loop in Practice
The AI builds a feature. You test it in a real environment. You find what is broken or wrong. You describe it specifically: not “the search doesn’t work” but “when I search for a keyword that contains a hyphen, the results come back empty — but the same keyword without the hyphen returns the expected results.” The more specific the description, the faster the fix and the lower the chance of the fix introducing a different problem.
This specificity discipline improves over time. After a few months of writing precise issue descriptions for an AI, you naturally think more analytically about software behavior — what should happen, what is happening, what the gap between them is. That analytical thinking is a transferable professional skill that applies to any software you use or manage, not just software you build.
The iteration loop ran faster than any prior experience with human development teams. A bug described in the morning was typically fixed by mid-morning. A feature built in a morning session was tested by noon and corrected — if needed — before the afternoon session began. Running four complete build-test-correct cycles in a single day was achievable. In a traditional team, a single cycle typically takes days to a week. The compression is qualitative, not just quantitative — it changes the character of the work.
The Signal We Missed
Halfway through the AI News Cafe build, something happened that would repeat many times before its significance was recognized: the AI built the same component that had been built before in slightly different form. The API integration connecting to Claude’s language model was functionally identical to one from a previous project. The admin panel followed a pattern that would be reinvented in the next five products. The database table structure resembled one that already existed.
At the time, this felt like consistency — the AI was applying good patterns. In retrospect, it was a signal: these components should be shared, not independently rebuilt for each product. The same API client, built six times with slight variations, means fixing a bug in the API client six times. The same admin panel pattern, rebuilt five times, means any improvement discovered in one product doesn’t propagate to the others.
The shared library that eventually addressed this problem was not built until after the tenth product. It should have been started after the second. That delay is covered in detail in Post 4. The point here is that the signal was available from the beginning of the journey. Recognizing what the signal meant, and responding to it, would have been the highest-leverage intervention available.
What Done Looks Like
One of the harder early disciplines was establishing a working definition of done. The AI can always add more features, refine the UI, optimize performance. The speed of building makes it easy to keep building indefinitely — “just one more thing” in twenty minutes is always available. Without a clear definition of done applied consistently, products drift in an extended build cycle rather than shipping.
The definition that worked: a product is done for its current version when all v1 requirements are met, the primary user scenarios work without errors when walked through manually in a real environment, and the code is deployed somewhere a real user could access it. Not perfect. Not extended with later-list features. Deployed and working for the v1 use case.
The first product reached this point after about three weeks. By the end of the year, with better requirements documents, shared components, and established patterns, a similar product reached the same threshold in three to five days. That trajectory — three weeks to three days for equivalent product complexity — is the compound effect of consistent methodology improvement applied across twenty products over twelve months.
How Recent AI Innovations Change This Picture
The mistakes documented in this post — scope creep, insufficient requirements, underestimating the iteration budget required for integration work — are human mistakes. The AI innovations that have shipped since this work was done do not eliminate the human discipline required to avoid them. But they do change the cost and recovery profile when those mistakes happen.
Checkpoints, introduced in the Claude Agent SDK, let you save code state and rewind to previous versions automatically. This addresses one of the most painful failure modes described in this post: building in the wrong direction for multiple sessions before realizing the architecture needed to change. With checkpoints, rewinding to a known-good state is a recoverable operation rather than a costly decision that involves deciding how much to manually undo. The ability to experiment more aggressively with architecture changes — knowing you can always step back — changes the risk calculus on architectural decisions made early in a project.
Background tasks in the Claude Agent SDK mean that long-running build processes no longer require synchronous waiting. During the first product builds in this case study, sessions involved significant waiting time while builds ran, environments were set up, or dependencies were installed. Background tasks let those processes run asynchronously while other work continues. The effect on session productivity is meaningful: less idle time waiting for processes, more overlap between build and planning cycles.
Perhaps most relevant to this post’s lessons about first-product mistakes: the 1-million-token context window means that feeding a complete, detailed requirements document to the AI — including all the edge cases, user scenarios, and constraint documentation that was often abbreviated for context-length reasons — is no longer a tradeoff. Full context costs nothing extra in terms of context limits. First-build quality tends to improve when the AI has complete requirements to work from, not abbreviated versions.