A widely circulated analysis is puncturing one of enterprise AI's favorite narratives -- the idea that AI coding tools turn engineering organizations into high-output 'software factories.' The argument: AI dramatically speeds up the writing of code, but the downstream parts of the software lifecycle -- testing, code review, deployment safeguards and quality control -- don't automatically scale with it, so the net result for many companies is simply shipping bugs and incidents faster.
The data gives the thesis teeth. Figures cited from Faros AI show task throughput per developer up 33.7% and pull-request merge rate up 16.2% -- real productivity gains. But over the same period, the ratio of incidents to pull requests jumped 242.7% and bugs per developer rose 54%. In other words, the defects are climbing far faster than the output, suggesting that AI-accelerated coding without commensurate investment in quality controls can erode reliability rather than improve it.
“Figures cited from Faros AI show task throughput per developer up 33.7% and pull-request merge rate up 16.2% -- real productivity gains.”
The finding lands amid a broader reckoning over how to actually capture value from generative AI in the enterprise. The early phase was about adoption and raw productivity metrics; the maturing phase is about whether that productivity translates into better, more reliable software or just more churn. It connects to the same discipline behind efficient agent memory and rigorous agent evaluation -- the unglamorous engineering that separates production systems from impressive demos.
The competitive implication is a new market opportunity. If writing code is no longer the bottleneck, verifying it becomes the constraint -- creating demand for AI-native testing, automated code review, observability and reliability tooling. Companies like the agent-evaluation startups drawing fresh venture funding, alongside established devops and quality vendors, are positioned to sell the guardrails that AI-accelerated teams now need. The bottleneck moving from production to verification is itself an investable thesis.
The bear case for the alarm: a single vendor's dataset can be unrepresentative, the quality dip may be a transitional growing pain as teams adapt their processes, and better AI review tools could close the gap. What to watch: whether independent studies corroborate the incident surge, how engineering leaders rebalance investment toward quality controls, and whether 'AI for verifying code' becomes as big a category as 'AI for writing code.'