Vibe Coding Best Practices: What Engineering Teams Should Track

Metrics That Matter

If you are adopting AI coding tools, you need to measure their actual impact rather than relying on developer sentiment (which consistently over-estimates productivity gains by 20-40%). **Core metrics:** - **% AI-assisted commits** - Healthy range: 40-50% of committed code. Below 20% suggests adoption friction; above 70% suggests insufficient human oversight. - **Cycle time reduction** - Benchmark: 18-24% improvement. Measure from commit to deploy, not just time writing code. - **PR throughput** - DX Q1 2026 data shows daily AI users merge 2.3-4.1 PRs/week depending on tool (vs 1.4 for non-users). - **Bug rate by code origin** - Separate AI-generated vs human-written defects. CodeRabbit data suggests 1.7x more issues in AI code. - **Security finding rate** - Track vulnerability density in AI-generated vs human-written code separately. Baseline: 45% of AI code contains security flaws. - **Code churn within 14 days** - AI code that gets rewritten within two weeks indicates quality issues. GitClear reports rising churn from 3.3% to 5.7-7.1%. - **Time debugging AI code** - Track explicitly. 66% of developers report spending more time fixing than saving. **DORA metrics remain relevant:** Change lead time, deployment frequency, and failed-deployment recovery time should be applied to AI-assisted vs non-assisted work streams separately.

Security Practices for AI-Generated Code

With 45% of AI-generated code failing security tests and AI-assisted commits introducing security findings at 10x the rate of manual coding (Cloud Security Alliance, Fortune 50 data), security cannot be optional. **Mandatory practices:** 1. **Automated security scanning on every PR** - Integrate SAST (Veracode, CodeQL, SonarQube) into CI/CD. Do not rely on developers to manually check AI output for vulnerabilities. 2. **Dependency verification** - 20% of AI-generated code references packages that do not exist. Automated dependency checking catches "slopsquatting" supply chain attacks where attackers register hallucinated package names. 3. **Secrets scanning** - AI tools can inadvertently include or expose credentials. One financial services organisation experienced a $2.3M regulatory response after an API key appeared in AI-generated code suggestions. 4. **Security-specific prompting** - Veracode's research shows security pass rates improve when models receive explicit security guidance in prompts. Build security requirements into your team's prompt templates. 5. **Human review for security-critical paths** - Authentication, authorisation, cryptography, payment processing, and data handling must have human security review regardless of how the code was generated. 6. **Regular vulnerability assessments** - Schedule periodic security audits specifically targeting AI-generated code segments.

Code Review for AI-Generated Code

AI-generated pull requests are typically 2-3x larger than human-written ones, creating reviewer fatigue. The solution is a tiered review approach: **Tier 1: Automated (every PR)** - Compile and run tests - Static analysis (CodeQL, SonarQube) - Security scanning - Style and formatting checks - Dependency verification **Tier 2: AI-assisted review (every PR)** - Use AI code review tools to flag logic issues, readability problems, and potential bugs - GitHub Copilot Enterprise offers automatic code review on all pull requests - This serves as the "second pair of eyes" - most teams treat AI as reviewer, not autonomous approver **Tier 3: Human review (risk-based)** - All security-critical changes: mandatory human review - Architectural changes: mandatory senior engineer review - Data model changes: mandatory review - Routine CRUD operations: human review optional if Tier 1 and 2 pass **Shared coding standards:** Write shared coding guidelines that work for both AI agents and humans. This means explicit, machine-readable style guides rather than implicit team conventions that AI cannot infer.

Testing Strategies

The Anthropic 2026 Agentic Coding Trends Report recommends "agentic quality control" where AI reviews and tests AI-generated output as part of the implementation loop. **Recommended testing approach:** 1. **Generate tests alongside code** - When asking AI to write a feature, include test generation in the same prompt. AI-native testing means tests are written as part of implementation, not as an afterthought. 2. **Automated coverage analysis** - Delegate coverage analysis to AI in CI pipelines. Flag any AI-generated code without corresponding tests. 3. **Flaky test detection** - AI-generated tests can be brittle. Monitor for flaky tests and track whether AI-generated test suites have higher flake rates. 4. **Integration testing emphasis** - Unit tests are where AI excels. Integration tests are where AI struggles. Prioritise human attention on integration and full-system test scenarios. 5. **Verification against specifications** - Rakuten achieved 99.9% numerical accuracy by verifying AI output against reference implementations. For critical calculations, always verify against known-good results. 6. **Security testing** - Beyond static analysis, run dynamic application security testing (DAST) and penetration testing on AI-generated features before production deployment.

When NOT to Use AI Coding

Knowing when to avoid AI is as important as knowing when to use it: **Do not use AI for:** - **High-stakes architectural decisions** requiring organisational context and long-term maintenance considerations - **Novel security-critical code** (authentication, cryptography, payment processing) without specialist review - **Tasks requiring institutional knowledge** that is not in the codebase - **Complex distributed system coordination** where subtle bugs cause cascading failures - **Regulated code requiring audit trails** unless your governance framework explicitly covers AI-generated contributions - **Code handling credentials or secrets** - keep sensitive data out of AI model context windows entirely **Use with caution:** - **Complex refactoring on unfamiliar codebases** - METR's RCT showed experienced developers were 19% slower with AI on large, familiar repos. On unfamiliar code, the risk is higher. - **Database migrations** - Subtle schema changes can cause data loss. Always verify against a staging environment. - **Anything you cannot verify** - If you lack the expertise to evaluate whether AI output is correct, you should not ship it to production. The Anthropic research found developers can "fully delegate" only 0-20% of tasks. The rest requires varying degrees of human involvement.

Process Recommendations for Safe Adoption

Based on data from organisations successfully adopting AI coding tools: **1. Start with low-risk, high-reward tasks** - Begin with scaffolding, boilerplate, tests, and documentation. These are where AI delivers the clearest productivity gains with the lowest risk. **2. Establish baselines before adoption** - Measure your current cycle time, bug rate, and security finding rate before rolling out AI tools. Without baselines, you cannot measure actual impact. **3. Track AI-specific metrics from day one** - Do not wait until problems emerge. Instrument your pipeline to distinguish AI-generated from human-written code and track quality indicators separately. **4. Invest in prompt engineering** - The quality of AI output depends heavily on how you ask. Develop team-level prompt templates and shared context files (CLAUDE.md, coding guidelines). **5. Pair AI adoption with security tool investment** - If you are generating 40%+ of code with AI, your security scanning volume needs to handle significantly more throughput. **6. Train for critical evaluation, not just usage** - The biggest risk is developers shipping bad code because they cannot distinguish it from good code. Train senior engineers as AI output reviewers. **7. Review and update quarterly** - The AI coding landscape changes faster than any other tooling category. Review your approved tools list, governance policies, and metrics quarterly.