The vendor marketing of the past year has been dominated by autonomous agents. Demos of multi-step reasoning systems booking travel, writing code, and orchestrating tool calls have absorbed most of the available attention, and a sizeable share of enterprise AI budgets has been earmarked for agent pilots. The work actually moving into production inside large organisations tells a different story. Most of it is retrieval: connecting models to internal document stores, structured data, and conversational interfaces over information the buyer already owns.
The gap between the two narratives is widening, and it matters for how enterprise AI teams are setting their roadmaps over the next two years.
What retrieval pipelines are actually doing in production
The retrieval pattern is unglamorous and well understood. A user query is converted into a search against a vector index, a keyword index, or some combination of both. The retrieved documents are passed into a model along with the query, and the model produces an answer grounded in the retrieved content. Variations exist around chunking, re-ranking, hybrid search, and citation, but the basic architecture has been stable for nearly two years.
What that architecture does inside enterprises is also fairly consistent. Internal knowledge assistants answering employee questions from policy documents and HR manuals. Customer support workflows that surface relevant past tickets and product documentation to human agents. Sales enablement tools that pull from a library of approved messaging and competitive intelligence. Legal and compliance review tools that surface relevant clauses from contract repositories.
None of these is conceptually new. What is new is that the model layer makes them useful in ways that earlier enterprise search and chatbot generations were not. The retrieval pattern works because the model can synthesise across multiple retrieved documents, handle the messy phrasing of natural questions, and produce answers in a format the user can actually consume.
Why agent pilots have been slower to land
The agent pattern is harder to deploy in production for a set of practical reasons that have become more visible as pilots have matured.
Multi-step reasoning chains compound errors. A pipeline that needs five model calls to complete a task, each with some probability of producing a flawed step, will fail at a rate that the individual call accuracy does not suggest. Enterprise workflows often have low tolerance for the kinds of compounding failures that agentic systems produce, and the engineering effort required to make them reliable is substantial.
Tool integration is also harder than it looks. Most enterprise environments do not have clean, well-documented APIs across the systems an agent would need to act on, and the integration work to give an agent meaningful capability often resembles a conventional enterprise integration project: long, expensive, and dependent on cooperation from system owners who may not see why the AI team needs access.
The third issue is observability. When an agent makes a decision the buyer disagrees with, tracing the reasoning back through the chain of model calls, tool invocations, and intermediate outputs is not trivial. Most enterprise AI teams have not built the observability tooling that would make agent debugging tractable at scale. Retrieval pipelines, by contrast, have a much shorter trace from query to answer.
What the budget data is starting to show
Procurement signals back up what infrastructure teams have been describing informally for months. Enterprise spending on vector databases, embedding model APIs, and retrieval orchestration tools has grown substantially across most surveyed segments. Spending on agent-specific frameworks and orchestration platforms has grown more slowly, and a meaningful share of agent pilots has either stalled in evaluation or been redirected toward narrower retrieval-flavoured use cases.
That redirection is worth paying attention to. Several enterprise AI teams that started agent pilots in the first half of last year have described a pattern: the original agent scope proves harder to deliver reliably than expected, the team scales the ambition down to a retrieval-augmented workflow with one or two narrow tool calls, and the resulting system ships to production successfully. The label remains "agent" in the internal communications, but the actual architecture is much closer to retrieval with controlled actions.
This is not a failure of the agent paradigm. It is a sign that the gap between the demo and the production system is larger for agents than for retrieval, and that enterprise AI teams are finding the practical edge of what they can ship.
Where the retrieval pattern is hitting its own limits
Retrieval is not a universal solution. The pattern works well when the answer to a question is contained in the buyer's existing documents and structured data. It works less well when the question requires reasoning across documents that contradict each other, when the answer depends on data the buyer does not have, or when the task involves taking action rather than answering a question.
The current frontier of retrieval engineering is dealing with the harder cases. Hybrid search systems combining vector and keyword retrieval have improved recall on technical and domain-specific content. Re-ranking models help with the precision problem on large document collections. Structured retrieval against databases and knowledge graphs is being layered alongside document retrieval to handle questions that require numerical or relational answers.
The hardest unsolved problem is grounding quality at scale. A retrieval system that returns the correct supporting documents but allows the model to introduce unsupported claims in its answer is worse than no system at all in many enterprise contexts. The teams running the most mature retrieval deployments have invested heavily in citation discipline, output verification, and evaluation tooling that can detect when the model has gone beyond its sources.
What this means for AI roadmaps
The practical implication for enterprise AI teams is that the next twelve to eighteen months are likely to look more like the past twelve than the agent-dominated future the vendor marketing has been pointing at. The retrieval pattern still has substantial unrealised value in most enterprises, and the engineering work to make existing retrieval deployments more reliable, better cited, and better integrated into actual user workflows is a larger and more productive backlog than most teams have fully worked through.
Agent work belongs on the roadmap, but probably in narrower scope than the early pilots assumed. The teams getting useful production value from agentic patterns are typically running them in tightly bounded environments: a single business process, a small number of tools, well-defined success criteria, rather than as general-purpose autonomous systems. That is not the agent future the demos have promised, but it is the agent reality the production work supports.
The interesting question for the next two years is whether the model layer will close the reliability gap that currently makes broad agentic deployment difficult. There is no shortage of work happening on that question across the major labs, and the answer will shape what enterprise AI looks like by the end of the decade. In the meantime, the work that pays off is the work being done on retrieval. That is where the quiet majority of enterprise AI value has been created so far, and where most of it will continue to come from in the near term.








