The Human in the Loop — What Real User Feedback Actually Changes
Every post in this series has described what we built and how we built it. This post is about what changed it. Not the AI, not the tooling, not the methodology frameworks — the feedback. The real-time, in-session, sitting-with-a-product feedback from Peter, from users, from the act of watching someone try to use something we thought was done.
One of the most clarifying experiences in twelve months of building was the moment we started doing what I now call RAID sessions — Rapid AI Iterative Development — with an actual user in the room. Not sending requirements to the AI and waiting. Not reviewing output alone. Sitting with someone who needed the product to work, watching them use it, and feeding their reactions directly back into the build session in real time. The speed of improvement in those sessions was an order of magnitude faster than any other mode of working. What had been hours of debugging became minutes. Features that seemed complete turned out to be wrong in ways that no amount of solo review would have found. And the AI, redirected immediately with specific observations rather than reconstructed reports written hours later, produced dramatically better corrections on the first attempt.
This post documents what we learned about that loop: what kinds of feedback move things most, what kinds stall progress, the patterns we can now identify in retrospect, and what we would have done differently if we had understood the dynamics from the start.
The Feedback That Mattered Most
Not all feedback is equal. After enough iterations across enough products, a clear taxonomy emerged: some feedback unlocked step-change improvements, some produced incremental refinement, and some — delivered the wrong way — caused the AI to produce worse output than it had before the feedback was given. Understanding which is which is a learnable skill, and it is one of the most valuable skills in the entire vibe coding practice.
The feedback that moved things most shared a common structure: it described an observed outcome and a desired outcome, without prescribing the technical solution. “When I search for an article by keyword, nothing comes back even though I know there are articles on that topic” is high-value feedback. It tells the AI exactly what is happening and exactly what the expectation is. The AI can diagnose the gap — is the search index stale? is the tokenization wrong? is there a case sensitivity issue? — and propose a fix. Compare that to “the search is broken, try a different approach.” Same problem, much lower information content. The AI has to guess at both the failure mode and the fix, and its guesses are less reliable than its diagnosis when given complete information.
The second category of high-value feedback was what I came to call state verification feedback: catching a gap between what had been requested and what had actually been executed. The most dramatic example in the entire development history was the copy-versus-move discovery during the codebase reorganization.
We had spent a session reorganizing the entire ITI project structure — moving products from internal directories to their proper homes, consolidating shared libraries, cleaning up a folder hierarchy that had grown organically across twelve months of rapid building. The AI executed the reorganization. The output looked correct. The files were in the right places. But Peter caught something: the original directories still existed. The AI had copied the files, not moved them. The reorganization appeared complete on the destination side while the source side remained intact — a state that would have been invisible in normal testing because everything we needed was accessible from the new locations.
The feedback that surfaced this was a single observation: “It looks like the files were copied rather than moved. Have you deleted the original copies after confirming the move was successful?” That one question triggered a cleanup of approximately 24 gigabytes of duplicated content and a verification pass that confirmed the source directories were fully removed. Caught before any external system referenced the new structure. If that feedback had not been given — if the assumption had been that the AI’s execution matched the request — the duplicate state would have persisted indefinitely, creating confusion every time anyone navigated the project.
What made that feedback effective: it was specific, it identified a gap between stated intent (“move”) and observed outcome (“copy behavior”), and it asked a verifying question rather than issuing a corrective command. The question format matters. “Have you deleted the originals?” is more effective than “Delete the originals” because it prompts the AI to verify its own state before acting — surfacing whether it had already done so, whether there was a reason it hadn’t, and whether there were dependencies that made deletion non-trivial.
The Architecture Conversation as Feedback Loop
Post 2 introduced the architecture conversation — the pre-build discussion where the AI describes what it is about to build before writing any code. What became clear over time is that this conversation is itself a feedback loop, and the quality of feedback given during it determines the quality of everything that follows.
The pattern that produced the best outcomes: after the AI proposed an architecture, asking it to walk through a specific user scenario against that architecture. “Walk me through what happens when a content editor opens the plugin on Monday morning, searches for healthcare AI articles from the last 48 hours, and tries to queue four of them for publishing.” That walk-through forced the architecture into contact with reality before any code existed. The AI would trace the scenario through its proposed file structure and either complete it cleanly — validating the design — or hit a gap it hadn’t anticipated and correct it in the proposal rather than in the build.
In the AI News Cafe plugin development, the architecture conversation walk-through revealed that WordPress’s object caching layer would serve stale results to the article retrieval function under the intended query pattern. That was a 10-minute conversation. The alternative — discovering it after the plugin was built — would have been a multi-hour debugging session involving cache invalidation logic that affected four different parts of the codebase simultaneously. The ratio of conversation time to avoided debugging time was approximately 1:20.
The feedback that unlocked this wasn’t technical. It was the act of making the AI narrate its proposed system against a concrete scenario. The concreteness is what does the work. Abstract architecture review misses the things that only become visible when a real person with a real workflow tries to move through a real system.
Stop-and-Restart: The Most Underrated Control Signal
One of the most valuable patterns in the entire session history is one that looks like a failure from the outside: stopping the AI mid-task and restarting with a different direction. In the transcript record across twelve months, “stop the previous activity” appears multiple times at exactly the moments where the most significant redirections happened.
The natural tendency when working with an AI that is mid-build is to let it finish and then correct. The rationale is intuitive: the AI has already done some of the work, stopping wastes that effort, it is probably close to done. This rationale is usually wrong, and the cost of following it is disproportionate to the cost of stopping.
When an AI is building in a direction that is structurally incorrect — wrong architecture, wrong scope, wrong interpretation of the requirements — every additional step it takes embeds the error more deeply. Each file it creates, each function it writes, each dependency it establishes, is built on the incorrect foundation. Correcting a half-built wrong system is almost always harder than stopping, resetting context, and building the right system from scratch. The sunk cost of the wrong work is small compared to the compounding cost of building on it.
The feedback pattern that enabled clean stops and restarts: a clear signal that the current direction was wrong, followed immediately by a description of the correct direction. Not “stop and wait for further instructions” — that leaves the AI in an ambiguous state where it does not know whether to hold context or release it. “Stop the current build. We need to take a different approach. Here is what I actually need.” The second sentence reorients the session before the AI has a chance to propose a continuation of the stopped work.
In the reorganization session, the stop-and-restart happened three times as the scope and approach clarified through iteration. Each restart produced a cleaner, more targeted execution than would have been possible in a single uninterrupted pass. The cost of the stops — a few minutes each — was paid back immediately by the reduced correction work in each subsequent phase.
Patterns of Good and Bad Feedback: The Taxonomy
After enough iterations, the feedback patterns that reliably produced step-change improvements versus the ones that reliably produced confusion or regression became identifiable. The taxonomy:
High-value feedback:
Observed-versus-expected: “When I do X, I get Y. I expected Z.” This format gives the AI both the failure mode and the success criterion simultaneously. It can diagnose the gap without having to guess at either end.
State verification: “Before we proceed, can you confirm that [specific state] is true?” Verification questions catch gaps between intended and actual execution before those gaps compound. They work especially well for multi-step operations where intermediate states are not visible in the final output.
Scope clarification: “The current build includes [X]. I need it to also handle [Y] and [Z], but [W] is out of scope for this version.” Explicit scope statements — especially the out-of-scope list — prevent the AI from gold-plating features that have been deliberately deferred and from building assumptions about features that haven’t been specified.
Reference verification: “Can you confirm all references to [X] have been updated across the codebase?” After structural changes — file moves, renames, architectural refactors — the AI’s automatic reference updating is reliable for the files it directly modified but may miss indirect dependencies. Explicit verification requests catch these. In the reorganization, this caught 50-plus documentation references that had not been updated automatically.
Low-value feedback:
Vague dissatisfaction: “This doesn’t feel right” or “Try a different approach.” Without specificity about what is wrong and what right looks like, the AI has no information to act on. It will produce a different output, but not necessarily a better one.
Solution-prescribing without problem-stating: “Add a caching layer” when the actual problem is slow query performance. The AI will add the caching layer as specified, even if the real bottleneck is somewhere the cache doesn’t reach. Describing the problem — “the article retrieval is slow on queries returning more than 50 results” — lets the AI propose and evaluate solutions, and its diagnosis is often more complete than the human’s first instinct about the fix.
Post-hoc corrections to already-embedded errors: “That implementation is wrong, please fix it” after an architectural decision has propagated across many files. The better intervention point was the architecture conversation, before any code existed. When this feedback pattern appears repeatedly, it is a signal that the architecture conversation step is being skipped or abbreviated.
What We Would Have Done Differently
Looking back at the full iteration history, the most expensive failures share a common structure: they were detectable earlier in the feedback loop than they were caught. The copy-versus-move issue was detectable at the point of the original move request, if the request had included an explicit verification requirement. The over-scope of the AI News Cafe v1 was detectable in the requirements document, if there had been an explicit v1-versus-later-list discipline from the start. The reference update gaps were detectable immediately after each structural change, if verification had been systematic rather than assumed.
The change we would have made from the beginning: every significant operation should have an explicit confirmation checkpoint before it is considered complete. Not “tell me when you’re done” — “confirm the following before we proceed: [list of specific states that should be true].” This moves verification from an afterthought — something done when something seems wrong — to a structural part of every operation. The AI’s execution is reliable; its reporting of its own execution is also reliable when asked specific questions. The gap is between what was requested and what was executed, and that gap closes almost entirely when explicit verification is built into the workflow rather than added when problems surface.
The second change: sit with users earlier. The RAID session model — real-time feedback from a real user during a live build — produced more useful signal in an hour than days of solo review produced across the products we built without it. The barrier to doing this was mostly psychological: a sense that the product needed to be further along before it was ready to show anyone. That sense is almost always wrong. A half-built product shown to a real user reveals which half matters and which half can wait. A fully-built product shown to a real user for the first time reveals which fully-built features were built wrong and need to be rebuilt. The earlier the user is in the loop, the smaller the rework.
The third change: build the iteration infrastructure before building the products. The shared library, the agent skills, the context files — all of these were built reactively, after the pain of not having them was felt repeatedly across multiple products. If those infrastructure investments had been made first, the compounding effect would have started from product one rather than product fifteen. The methodology described in this series is the methodology we arrived at through iteration. The right answer was to design that methodology explicitly at the start and build to it, rather than discover it incrementally through the cost of its absence.
The Feedback Loop as a Core Competency
The frame that captures all of this: the feedback loop is not a support function for the build process. It is the build process. The AI generates code. The human evaluates it. The feedback from that evaluation is the primary input that determines what the AI generates next. Everything else — the requirements documents, the architecture conversations, the shared libraries, the context management — is infrastructure for making that feedback loop faster, higher-quality, and more durable.
Professionals who develop the skill of giving high-quality feedback to AI systems — specific, observed, outcome-focused, with explicit verification — will compound that advantage across every product they build and every task they complete with AI collaboration. It is a skill that transfers across domains, improves with practice, and becomes more valuable as AI systems become more capable. More capable AI makes better use of better feedback; the ceiling on what the feedback loop can produce rises with every generation of AI improvement.
The human is in the loop not as a fallback for AI failure but as the source of the judgment that makes AI output useful. That role does not diminish as AI gets better. It evolves — from correcting errors to directing quality, from catching bugs to setting standards, from debugging output to architecting outcomes. The loop gets faster, but the human’s place at the center of it does not change.