Can you prove AI ROI in Software Eng?

7 2 minutes read

Synopsis of the talk: ” (Stanford 120k Devs Study)” by Yegor Denisov-Blanch, Stanford researcher (presented at the AI Engineer Code Summit).

In this data-driven presentation, Yegor Denisov-Blanch shares insights from Stanford’s large-scale, time-series research on AI’s real impact on software engineering productivity. The study analyzes telemetry and outcomes from over 120,000 developers across 600+ companies, using sophisticated measurement techniques (including ML models that approximate expert code reviews for effort, complexity, and maintainability) rather than relying on simplistic metrics like lines of code or PR counts.

Key Findings:

Modest but real median gains: As of mid-2025 (July data point), the median productivity uplift from AI coding tools is around 10%. Some sources referencing the talk or related Stanford work cite net gains in the 15–20% range after adjustments, but results vary widely. Top-performing teams see 19–25%+ gains, while others experience near 0% or even negative impact. The gap between high and low performers is widening over time (4x increase in spread from 2023 to 2025).
High variance and the “rich get richer” effect: Identical AI tools deliver dramatically different results depending on the organization, team practices, and implementation. Early successful adopters compound advantages, while laggards fall further behind. Factors driving better outcomes include quality of usage (not just quantity), clean codebase/maintainability, task type, and integration into workflows.
Where AI helps most (and least):
- Greater gains on easy tasks, greenfield projects, and popular languages (e.g., Python, Java: 10–25% uplift).
- Smaller or negative effects on complex tasks, brownfield/legacy code, and niche/legacy languages (e.g., COBOL, Haskell).
- AI increases code volume and task completion but often leads to bigger, buggier code, shifting bottlenecks to code review and increasing rework/technical debt later.
Measurement challenges and common traps: Traditional metrics (PR counts, DORA metrics, survey sentiment, lines changed) are misleading and can create false positives. They measure activity, not true output or business value. The talk emphasizes the need for better proxies of engineering output (effort-adjusted, maintainability-aware) and warns against declaring victory based on short-term speed alone, as it can mask long-term debt.
ROI playbook and recommendations: Organizations must invest in proper measurement, process changes, and habits (e.g., strong review gates, focus on maintainability, targeted usage for suitable tasks). The presentation provides a step-by-step guide on what to track, biases to avoid, and practices of top-quartile teams to realize meaningful returns.

Overall Message:

AI coding assistants deliver tangible productivity benefits for many developers and teams, but the ROI is far from automatic or universal. Hype around massive gains often ignores variance, quality trade-offs, and the need for organizational maturity. Without rigorous measurement and complementary changes in processes, many companies see limited or illusory benefits. The talk balances caution with optimism, stressing that mastering AI usage and measurement is becoming a key competitive differentiator. It ends with practical advice for leaders to move beyond “vibes” to provable value. The video is roughly 20–30 minutes.

This complements the previous Qodo talk by shifting focus from code quality/trust to measurable productivity and business ROI.

insights 10 hours ago

7 2 minutes read