The AI Feedback Loop That Isn't Working Yet
Why developers are slower with AI tools despite believing they're faster and what actually works
Developers have a time problem. Not the “I need more hours in the day” kind—though that’s true too, but a feedback problem that costs them hours or even days of productive work.
When you submit code changes, you wait. The CI/CD pipeline runs its tests. Sometimes it fails in the final stage after hours of processing. You get a cryptic error log. You debug. You resubmit. You wait again.
This is the expensive reality of modern software development. A 2024 research paper from Chalmers University identified this pattern, noting that “developers often seek expedited results from these pipelines”, but the architecture of most CI/CD systems works against this preference.
Now we have data showing the problem is more complex than anyone predicted.
The Real Cost: Time Saved, Time Lost
Atlassian’s 2025 State of Developer Experience survey found that AI is saving developers approximately 10 hours per week. That sounds like unqualified success, until you see the other half of the equation.
The same survey found that 50% of developers report losing 10+ hours per week to organizational inefficiencies, finding information, adapting new technology, and context switching between tools. Developers are saving 10 hours a week with AI and losing 10 hours a week to organizational friction.
We’re right back where we started, except now there’s an illusion of progress.
Most organizations aren’t using AI to address friction points, they’re using it to speed up the parts that weren’t actually bottlenecks. Developers only spend about 16% of their time coding, and coding isn’t their primary friction point. Yet that’s where most AI investment goes.
The Trust Problem
Developer sentiment tells another part of this story. Positive sentiment for AI tools has decreased in 2025 to just 60%, down from over 70% in both 2023 and 2024. More developers now actively distrust the accuracy of AI tools (46%) than trust it (33%).
Experienced developers are the most cautious, with only 2.6% reporting they “highly trust” AI output and 20% reporting they “highly distrust” it. This widespread understanding that AI outputs require human verification explains why experienced developers often slow down—they’re doing additional verification work.
As Salvatore Sanfilippo observed, while LLMs can write parts of a codebase successfully under strict supervision, “when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways”.
What Actually Works: CI/CD Integration
Despite the challenges, some applications show genuine promise. The vision of LLMs embedded in CI/CD pipelines has moved from theory to practice.
Tools now embed LLM-powered code reviews into CI/CD workflows, ensuring code quality checks happen automatically with every commit. One finance company reduced build failures by 47% after implementing LLM-based self-healing pipelines, with engineers saving 7.5 hours weekly.
Faire’s implementation of automated code reviews demonstrates how this works in practice. They use LLMs to automate generic review requirements, the checks that don’t require deep project context but do consume reviewer time. This frees human reviewers to focus on architectural decisions and whether code actually meets product requirements.
The difference? Integration into existing workflows rather than standalone tools, focus on organizational friction points rather than individual productivity, and automation of repetitive checks rather than replacement of human judgment.
Log Analysis: Where AI Actually Excels
One area where AI demonstrates clear value is log analysis, exactly what the Chalmers research identified as a key opportunity.
Recent studies show LLMs achieve an F1-score of 0.928 for vulnerability detection in log analysis, significantly outperforming traditional models like XGBoost (0.555) and LightGBM (0.432).
IBM’s production deployment provides real-world validation. By December 2024, their LLM-based log analysis tool had processed 1,376 cases, handling 877 GB of data and 1.04 billion log lines. Among respondents, 53.79% found the tool beneficial, and 60.4% of products reported saving at least 30 minutes per trigger.
Why does log analysis work when other applications struggle? Three factors: defined scope with clear inputs and outputs, natural language advantage since logs are semi-structured text, and straightforward verification paths.
The Learning Curve Nobody Discussed
The METR study found that three-quarters of participants saw reduced performance when using AI tools. However, one of the top performers with AI had the most previous Cursor experience. The paper acknowledges: “it’s plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup”.
Amazon’s experience with their Q coding assistant tells a similar story. After significant improvements in April 2025, about half of developers found it genuinely helpful, but it still has limitations, including understanding only one file at a time. Interestingly, models fine-tuned on Amazon’s own massive codebase “feel only moderately better than non-trained models”.
Effective AI-assisted development requires significant practice with specific tools, understanding of tool limitations, workflow integration rather than skill replacement, and context management most developers haven’t mastered.
The Empathy Gap
Perhaps most concerning: 63% of developers now say leaders don’t understand their pain points, up sharply from 44% in 2024.
This widening empathy gap explains why AI deployment often misses the mark. Leaders see developers using AI and assume productivity is improving. Developers experience the slowdown, the verification work, the context-switching overhead—but their perception doesn’t match reality.
The JetBrains 2025 Developer Ecosystem survey found that 66% of developers don’t believe current metrics reflect their true contributions. While tech decision-makers dream of reducing technical debt, developers want transparency, constructive feedback, and clarity of goals.
Internal collaboration, communication, and clarity are now just as important as faster CI pipelines or better IDEs. Yet organizations continue to invest primarily in the latter.
What This Means for Engineering Leaders
Measure what matters: If developers take longer with AI but believe they’re faster, your productivity metrics aren’t capturing reality. Time-to-completion matters, but so do code quality, maintainability, and developer confidence.
Focus on friction, not features: Developers lose time to finding information, adapting new technology, and context switching, none of which AI coding assistants address. The time saved writing code gets consumed by organizational inefficiency.
Integration over innovation: The most successful AI deployments integrate into existing workflows. Faire’s automated code reviews work because they happen within the pull request process developers already use.
The learning curve is real: Don’t expect immediate productivity gains. Developers need significant experience with specific AI tools before seeing benefits. Budget for training time.
Trust the skeptics: Experienced developers are the most cautious about AI tools—and they’re often right to be. Their skepticism reflects understanding of where AI helps and where it introduces problems.
The Path That Actually Works
The evidence points toward a clear pattern:
Start with log analysis and CI/CD integration, where AI demonstrates clear value. These applications have defined scope, shorter feedback loops, and easier verification.
Automate the repetitive, not the creative. Use AI for style consistency checks, error log analysis, and test case generation. Don’t expect it to architect systems or make nuanced product decisions.
Measure time-to-feedback, not just time-to-completion. Real value comes from shortening feedback loops—helping developers know sooner whether their approach will work.
Address organizational friction first. AI can’t fix poor documentation, unclear requirements, or excessive context-switching. It might amplify these problems.
Build AI literacy as a team skill. Developers who benefit most have invested significant time learning tool strengths and limitations.
The Real Opportunity
AI tools for development work when they address actual friction points, integrate into existing processes, and focus on areas where verification is straightforward. They don’t work when deployed as generic productivity enhancers or when organizations use them to avoid addressing systemic problems.
The 2024 Chalmers research proposed a vision of real-time, integrated AI feedback throughout the development cycle. That vision remains compelling. But the path runs through organizational culture, not just technology.
The competitive advantage goes to teams who can iterate faster while maintaining quality. But “faster” doesn’t mean writing code more quickly. It means shortening feedback loops, reducing cognitive load, and helping developers maintain flow state.
AI can help with all of these, if we deploy it thoughtfully. But thoughtful deployment requires understanding that developer experience isn’t a tooling problem. It’s a culture problem that tools can help solve, but only if we use them to address the real sources of friction.
The organizations that will benefit most are those who approach it systematically: identifying specific friction points, measuring actual outcomes rather than perceived productivity, and iterating based on what works rather than what the vendor deck promised.
The question isn’t whether AI will transform software development—it already is. The question is whether we’ll use it to actually improve developer experience, or just create an illusion of progress while the same old problems persist beneath a shiny new interface.
Want to suggest a topic for me to write about? Submit your idea here and I might tackle it in an upcoming post.
