The Honest Risk Assessment — What Can Actually Go Wrong With Vibe Coding
Any methodology that promises dramatically better results than the status quo should be examined for its risks. Vibe coding is no exception. The time compression and cost efficiency documented in the previous posts are real. So are the risks. This post names them clearly, explains what each risk means in practice, and describes what mitigation is available — without either dismissing risks as unimportant or treating them as arguments against using the methodology.
Risk 1: Technical Debt Accumulation at Velocity
Technical debt is the accumulated cost of speed-quality trade-offs. Code that works but is fragile, inconsistently structured, poorly documented, or difficult to extend safely. Every development practice accumulates some technical debt. Vibe coding accumulates it faster than traditional development because velocity is the primary value being optimized, and velocity trade-offs typically go against thoroughness.
The specific mechanism in vibe coding: the AI produces technically correct implementations of what was specified. It does not produce defensive implementations of what was not specified — error handling for edge cases not mentioned in the requirements, documentation for decisions not explicitly captured, test coverage for scenarios not included in the brief. The gap between “what was specified” and “what should have been specified” fills with technical debt at a rate proportional to how fast you are moving.
The evidence from the portfolio is concrete: the duplicate API client problem (six-plus implementations before the shared library) was technical debt. The sparse CLAUDE.md files in early products were documentation debt. The absence of automated testing was testing debt. None of these were catastrophic, but all required real time to address and all reduced the quality of the products they affected.
Mitigation is straightforward in principle and requires discipline in practice: establish quality gates that are non-negotiable regardless of velocity. Context file updated at the end of every session. Shared library used for any component type built more than twice. Testing completed before the next feature is built. These three disciplines, applied consistently, prevent most technical debt accumulation. The cost is approximately twenty percent of total session time in overhead. The benefit is a codebase that remains maintainable as the portfolio scales.
Risk 2: Security Gaps in Generated Code
AI-generated code generally follows security best practices — when those practices are common enough to be in the training data and when the context makes them obviously applicable. SQL injection prevention, input sanitization, proper session management, CSRF protection: these appear in AI-generated code reliably for standard WordPress patterns. They appear less reliably for non-standard patterns, novel integrations, or any situation where the security requirement needs to be inferred rather than explicitly specified.
For products handling low-sensitivity data on standard platforms, the security risk from AI-generated code is manageable. For products handling financial information, health data, authentication credentials, or any personally identifiable information subject to regulatory requirements, AI-generated code requires explicit security specification and deliberate review before production deployment.
Mitigation: include security requirements explicitly in requirements documents for any sensitive product. Not “the product should be secure” — that is not actionable. Specific requirements: “Input from users must be sanitized using WordPress’s sanitize_text_field function before any database operation. The admin panel must verify nonces on all form submissions. API keys must never be output to any user-visible page.” After the build, walk through a security checklist specific to the platform and data type. Verify, do not assume.
Risk 3: Context Dependency and Skill Atrophy
When an AI produces most of the code in a practice, the human practitioner exercises code-writing skills less frequently. If the practitioner does not compensate for this deliberately, those skills weaken over time — and with them, the ability to evaluate AI output technically rather than just experientially.
This matters because evaluation quality is correlated with production quality. The ability to identify whether an AI-generated security implementation is actually secure requires understanding security implementations. The ability to catch a subtle bug in an AI-generated algorithm requires understanding algorithms. If the production skills atrophy, the evaluation skills atrophy proportionally, and the human side of the collaboration weakens precisely where it is most needed.
Mitigation: periodic production work to keep skills calibrated. Not because it is more efficient than having the AI produce the code — it is not — but because it is professional fitness maintenance. Write code occasionally to remain capable of evaluating code. Build models occasionally to remain capable of evaluating models. The Coding Playbook is also a direct mitigation: actively building understanding of the patterns being used, not just applying them, preserves the conceptual framework that makes evaluation possible even when production skills are not regularly exercised.
Risk 4: AI Provider Dependency
The entire practice documented in this series depends on one AI provider’s API. Anthropic’s pricing, capability, model behavior, and availability directly affect every aspect of the development workflow. There is no equivalent of supplier diversification available when the tool is an AI assistant — switching providers is not like switching component vendors, because the behavior, conventions, and context compatibility differ enough across models to require significant methodology adjustment.
This is not theoretical. AI pricing changed substantially between 2023 and 2025. Model behavior changes with each version update in ways that affect workflow. A session approach that was optimized for one model version may produce noticeably different results with a subsequent version.
Mitigation: document patterns in terms independent of any specific AI behavior. The Playbook, context files, and architectural decisions should be described in plain language that any capable AI system could apply. Avoid building workflow dependencies on idiosyncratic behaviors of a specific model version. Test major workflow components when model versions change, before those changes affect production work. The goal is a practice that is AI-powered but not provider-locked in ways that create single points of failure.
Risk 5: Over-Scope Commitment
Because vibe coding is fast, the temptation to commit to ambitious scope is constant. “We can do that” becomes a default because the generation speed makes it feel true. But the gap between generation speed and delivery quality — including testing, integration, security review, and user-acceptance validation — is where this commitment becomes a problem.
The risk compounds in client relationships where the client’s expectations were set based on implied AI-assisted speed without accounting for the validation work that does not compress. Clients do not know or care about AI-assisted development timelines. They know what was committed to, and they measure delivery against that commitment.
Mitigation: scope discipline applied from the first requirement, regardless of how fast the build feels. The same timeline buffers applied in traditional development for external commitments. The v1 and later split applied rigorously before any client-facing commitment is made. The cost of under-promising and over-delivering is zero. The cost of the reverse includes relationship damage that no amount of AI speed can repair.
Risk 6: Knowledge Concentration
In a solo vibe coding practice, institutional knowledge about how products work lives primarily in one person. CLAUDE.md files help. The Playbook helps. But they do not completely substitute for the accumulated understanding of the practitioner who worked through every decision firsthand.
A contractor brought in to address a production issue would need significant time to develop the context required to work safely — even with good documentation. The documentation captures decisions and conventions; it cannot fully capture the reasoning path that led to those decisions or the understanding of how they fit together.
Mitigation: write documentation at the level required to hand off to a competent developer tomorrow. Write CLAUDE.md files as if the next person to maintain this product will be new to it. Write README files as if the developer is familiar with the technology stack but not with this specific product. Write decision records as if the reader will not have the contextual understanding that makes the decision seem obvious. This level of documentation takes more effort than documentation written for yourself alone, but it provides genuine insurance against the knowledge concentration risk and produces better documentation quality as a direct output.
The Risk Comparison Table
Rather than treating vibe coding risks in isolation, comparing them against the risks of traditional development for the same product scope provides a more useful decision framework:
Cost overrun risk: Very low in vibe coding (AI API costs are predictable and low). High in traditional development (team scaling, scope changes, delays all compound cost). Vibe coding wins decisively.
Timeline risk: Low in vibe coding (no coordination overhead, fast iteration). High in traditional development (coordination overhead, reviews, dependencies all extend timelines). Vibe coding wins decisively.
Code quality risk: Moderate in vibe coding (quality discipline required, debt accumulation risk). Lower in traditional development with good team practices (code review, dedicated QA). Traditional wins slightly with good team practices; roughly equal with mediocre team practices.
Security risk: Moderate in vibe coding (explicit security specification required). Moderate in traditional development (depends on team security culture). Roughly equal with appropriate discipline on both sides.
Knowledge concentration risk: High in solo vibe coding. Low in traditional development with multiple team members. Traditional wins clearly.
Hiring and staffing risk: Zero in solo vibe coding. High in traditional development (hiring failures, turnover, skill gaps). Vibe coding wins decisively.
Long-term maintainability: Depends heavily on infrastructure investment in vibe coding. Generally better in traditional development with dedicated maintenance culture. Traditional wins without infrastructure investment; roughly equal with appropriate infrastructure investment.
For the use cases in this portfolio — products at startup or internal tool scale, where cost and timeline are primary constraints — the risk-weighted comparison clearly favors vibe coding when appropriate quality disciplines are maintained.
The Risk That Matters Most in Your Context
Which risk dominates depends on the specific product and organizational context. A product handling sensitive customer financial data in a regulated industry has a security and compliance risk profile that makes the security mitigation requirements in vibe coding non-negotiable — and may make traditional development with dedicated security review the appropriate choice despite its higher cost and longer timeline. A startup building an MVP to test product-market fit has a cost and timeline risk profile that makes vibe coding the clearly appropriate choice, with security investment calibrated to the actual sensitivity of the data involved.
The framework: identify the top two or three risks that matter most for your specific product and context. Evaluate how vibe coding and traditional development compare on those specific risks. Make the methodology choice based on the risk comparison that actually matters for your situation, not on a general assessment of which methodology is better overall. The general assessment is useful for building intuition; the specific assessment is what determines the right choice for each product.
How Recent AI Innovations Change This Picture
The risks documented in this post — security vulnerabilities from AI-generated code, quality regression, context loss at critical moments, over-reliance on AI judgment — remain real. AI innovations do not eliminate these risks; they shift some and introduce new ones worth knowing about.
On the security risk: Claude Sonnet 4.6 has demonstrated improved consistency on instruction following, which is the mechanism through which security-relevant constraints (input validation, authentication requirements, privilege separation) are consistently applied. A model that more reliably follows explicit security instructions produces fewer security gaps from instruction drift. This does not replace human security review — it makes that review more likely to find the AI has correctly implemented what it was asked to, rather than drifted in a dangerous direction.
Agent Teams introduce new coordination risks. When multiple AI instances are working in parallel, the surface area for inconsistent implementation increases. A frontend agent and a backend agent working simultaneously can produce code that is individually correct but inconsistent at the interface. Human oversight of agent team outputs requires attention to integration seams, not just individual component correctness. The risk is manageable with clear interface specifications before parallel work begins, but it requires an additional layer of review discipline.
Over-reliance risk increases as AI capability increases. The better the AI becomes at producing plausible output, the harder it is to detect when that output is wrong but plausible. Extended thinking makes AI reasoning more transparent — the thinking blocks show how Claude reached a conclusion, not just what the conclusion was. Reviewing the reasoning, not just the output, is an additional check that becomes available with extended thinking enabled. This is a new risk-management practice that did not exist with earlier models.