The AI Feedback Loop That Isn't Working Yet
Why developers are slower with AI tools despite believing they're faster and what actually works
Developers have a time problem. Not the “I need more hours in the day” kind—though that’s true too, but a feedback problem that costs them hours or even days of productive work.
When you submit code changes, you wait. The CI/CD pipeline runs its tests. Sometimes it fails in the final stage after hours of processing. You get a cryptic error log. You debug. You resubmit. You wait again.
This is the expensive reality of modern software development. A 2024 research paper from Chalmers University identified this pattern, noting that “developers often seek expedited results from these pipelines”, but the architecture of most CI/CD systems works against this preference.
Now we have data showing the problem is more complex than anyone predicted.
The Real Cost: Time Saved, Time Lost
Atlassian’s 2025 State of Developer Experience survey found that AI is saving developers approximately 10 hours per week. That sounds like unqualified success, until you see the other half of the equation.
The same survey found that 50% of developers report losing 10+ hours per week to organizational inefficiencies, finding information, adapting new technology, and context switching between tools. Developers are saving 10 hours a week with AI and losing 10 hours a week to organizational friction.
We’re right back where we started, except now there’s an illusion of progress.
Most organizations aren’t using AI to address friction points, they’re using it to speed up the parts that weren’t actually bottlenecks. Developers only spend about 16% of their time coding, and coding isn’t their primary friction point. Yet that’s where most AI investment goes.
The Trust Problem
Developer sentiment tells another part of this story. Positive sentiment for AI tools has decreased in 2025 to just 60%, down from over 70% in both 2023 and 2024. More developers now actively distrust the accuracy of AI tools (46%) than trust it (33%).
Experienced developers are the most cautious, with only 2.6% reporting they “highly trust” AI output and 20% reporting they “highly distrust” it. This widespread understanding that AI outputs require human verification explains why experienced developers often slow down—they’re doing additional verification work.
As Salvatore Sanfilippo observed, while LLMs can write parts of a codebase successfully under strict supervision, “when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways”.
What Actually Works: CI/CD Integration
Despite the challenges, some applications show genuine promise. The vision of LLMs embedded in CI/CD pipelines has moved from theory to practice.
Tools now embed LLM-powered code reviews into CI/CD workflows, ensuring code quality checks happen automatically with every commit. One finance company reduced build failures by 47% after implementing LLM-based self-healing pipelines, with engineers saving 7.5 hours weekly.
Faire’s implementation of automated code reviews demonstrates how this works in practice. They use LLMs to automate generic review requirements, the checks that don’t require deep project context but do consume reviewer time. This frees human reviewers to focus on architectural decisions and whether code actually meets product requirements.
The difference? Integration into existing workflows rather than standalone tools, focus on organizational friction points rather than individual productivity, and automation of repetitive checks rather than replacement of human judgment.
Log Analysis: Where AI Actually Excels
One area where AI demonstrates clear value is log analysis, exactly what the Chalmers research identified as a key opportunity.
Recent studies show LLMs achieve an F1-score of 0.928 for vulnerability detection in log analysis, significantly outperforming traditional models like XGBoost (0.555) and LightGBM (0.432).
IBM’s production deployment provides real-world validation. By December 2024, their LLM-based log analysis tool had processed 1,376 cases, handling 877 GB of data and 1.04 billion log lines. Among respondents, 53.79% found the tool beneficial, and 60.4% of products reported saving at least 30 minutes per trigger.
Why does log analysis work when other applications struggle? Three factors: defined scope with clear inputs and outputs, natural language advantage since logs are semi-structured text, and straightforward verification paths.
The Learning Curve Nobody Discussed
The METR study found that three-quarters of participants saw reduced performance when using AI tools. However, one of the top performers with AI had the most previous Cursor experience. The paper acknowledges: “it’s plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup”.
Amazon’s experience with their Q coding assistant tells a similar story. After significant improvements in April 2025, about half of developers found it genuinely helpful, but it still has limitations, including understanding only one file at a time. Interestingly, models fine-tuned on Amazon’s own massive codebase “feel only moderately better than non-trained models”.
Effective AI-assisted development requires significant practice with specific tools, understanding of tool limitations, workflow integration rather than skill replacement, and context management most developers haven’t mastered.
The Empathy Gap
Perhaps most concerning: 63% of developers now say leaders don’t understand their pain points, up sharply from 44% in 2024.
This widening empathy gap explains why AI deployment often misses the mark. Leaders see developers using AI and assume productivity is improving. Developers experience the slowdown, the verification work, the context-switching overhead—but their perception doesn’t match reality.
The JetBrains 2025 Developer Ecosystem survey found that 66% of developers don’t believe current metrics reflect their true contributions. While tech decision-makers dream of reducing technical debt, developers want transparency, constructive feedback, and clarity of goals.
Internal collaboration, communication, and clarity are now just as important as faster CI pipelines or better IDEs. Yet organizations continue to invest primarily in the latter.
What This Means for Engineering Leaders
Measure what matters: If developers take longer with AI but believe they’re faster, your productivity metrics aren’t capturing reality. Time-to-completion matters, but so do code quality, maintainability, and developer confidence.
Focus on friction, not features: Developers lose time to finding information, adapting new technology, and context switching, none of which AI coding assistants address. The time saved writing code gets consumed by organizational inefficiency.
Integration over innovation: The most successful AI deployments integrate into existing workflows. Faire’s automated code reviews work because they happen within the pull request process developers already use.
The learning curve is real: Don’t expect immediate productivity gains. Developers need significant experience with specific AI tools before seeing benefits. Budget for training time.
Trust the skeptics: Experienced developers are the most cautious about AI tools—and they’re often right to be. Their skepticism reflects understanding of where AI helps and where it introduces problems.

