How Does MCP Elicitation Enable Human-in-the-Loop?

How Does MCP Elicitation Enable Human-in-the-Loop?

The development of sophisticated, agent-based AI systems has consistently faced a significant bottleneck: the challenge of creating seamless, scalable, and standardized communication between large language models (LLMs) and the vast ecosystem of third-party applications and backends. For years, developers have been mired in the process of building custom, one-off integrations, writing bespoke logic to connect agents with the unique APIs of each service they need to interact with. This approach is not only laborious and time-consuming but also fundamentally brittle; every new application requires a new manual integration, and any change to an API can break the entire chain. This fragmented landscape has hindered the creation of truly dynamic and interactive AI agents that can fluidly request information, seek clarification, and involve human users in their decision-making processes. The dream of an AI assistant that can pause, ask for confirmation before deleting a file, or prompt for missing details when booking a reservation has been just out of reach, often requiring complex and inefficient workarounds that compromise context and increase operational costs. A new standard was desperately needed to abstract away this complexity and enable a more fluid, interactive future for AI.

1. The Evolution of MCP and the Dawn of Elicitation

The Model Context Protocol (MCP) emerged as a powerful solution to this integration challenge, introducing an open standard developed by Anthropic to streamline how LLMs receive data. By establishing a single, standardized format for data exchange, MCP effectively decouples the agent from the specific implementation details of backend applications. The burden of creating custom connections and parsing unique API responses is lifted from the AI developer. Instead, the responsibility shifts to application developers, who can now expose their services through a unified MCP interface. This allows any compliant model or agent framework to understand and interact with the service out of thebox, fostering a more interoperable and scalable ecosystem. It transforms the development process from a series of bespoke, hard-coded integrations into a plug-and-play model, where agents can discover and utilize tools from any MCP-compliant server without requiring custom logic for each one. This standardization is the foundational layer upon which more complex and interactive agent behaviors can be built.

The protocol has not remained static, with a significant update in June 2025 introducing several key enhancements that expand its capabilities. One major addition is Structured Tool Output, which allows tools to return data in a structured format rather than as simple strings, greatly simplifying the process of parsing results and enabling more sophisticated data processing pipelines. On the security front, MCP servers are now treated as full-fledged OAuth 2.1 Resource Servers, mandating the use of Resource Indicators to prevent the misuse of access tokens across different services. Furthermore, tool responses can now include Resource Links, providing direct references to external assets like files or logs. However, the most transformative new feature is Elicitation. This mechanism allows an MCP server to actively request input from the user during a tool’s execution, pausing the process until the necessary information is provided. Elicitation represents a paradigm shift, moving beyond simple request-response interactions to enable a truly collaborative and interactive experience between the user, the agent, and the underlying tools.

2. Bridging the Gap Between Agents and Users

Elicitation is a formal mechanism within the Model Context Protocol that empowers a server to temporarily halt a tool’s execution and await missing data from the client, effectively bringing a human into the operational loop. This concept is similar to the Human-in-the-Loop (HITL) principle seen in various agent frameworks, where an agent pauses its task flow to seek user guidance or confirmation before proceeding. However, MCP’s implementation is architecturally distinct and addresses a unique set of challenges. An MCP server is designed to be isolated; it runs as an independent process that is completely unaware of the frontend user interface. Its sole communication channel is with the agent that invokes its tools. This decoupling, while beneficial for modularity and scalability, creates a problem: what happens when a tool, running on this isolated server, requires information it doesn’t have and cannot fetch on its own? This is the critical gap that Elicitation is designed to fill, providing a standardized protocol for the server to formally request data through the agent, which can then relay the request to the user.

Without Elicitation, handling such scenarios was often a cumbersome and inefficient process. Consider a common use case: an MCP server provides a BookTable tool for a restaurant reservation chatbot. When the agent calls this tool, it might lack essential information like the desired time, number of guests, or the name for the reservation. A typical workaround involved creating two separate tools: a gather_booking_info_tool for the frontend to collect data from the user, and a process_booking_tool on the backend to complete the booking once all information was present. This approach fragments the logic, often leads to a loss of context between the two steps, results in duplicate requests to the agent, and consumes additional tokens. Elicitation elegantly solves this by allowing the single BookTable tool to pause its own execution, send a structured request for the missing data, and wait for a response—all within a single request and session. The tool’s input can now come from both the LLM and the user, streamlining the workflow, preserving context, and creating a much more seamless and efficient interaction.

3. Implementing Elicitation in a Real-World Application

The Elicitation protocol itself is straightforward and relies on a structured exchange. When a tool determines it needs user input, it initiates an elicitation request by sending a JSON Schema that clearly defines the required fields. The protocol supports common data types, including strings, numbers, booleans, and enumerations, providing flexibility for a wide range of use cases. In response, the client must return an object containing an action and, if applicable, the requested content. The action can be one of three types: accept, indicating the user has approved the request and provided the necessary data; decline, signifying an explicit refusal; or cancel, used when the user dismisses the prompt without making a choice. A robust server-side handler must be designed to properly manage each of these potential outcomes. For a practical implementation, one can leverage a modern tech stack like Mastra, an open-source TypeScript framework, which simplifies the creation of agents and MCP servers. Using tools like Bun and Hono.js, a developer can quickly set up a high-performance HTTP server capable of hosting multiple MCP services, complete with support for WebSockets and authentication. A sample tool, such as one that runs shell commands, can be configured to trigger an elicitation prompt for user confirmation before executing potentially dangerous operations like rm or kill.

On the other side of the interaction is the agent and its user interface, which can be built using a framework like assistant-ui within a Next.js project. After the initial setup, which involves installing the necessary dependencies and configuring an API key for a service like OpenAI, the core task is to create a Mastra Agent and connect it to the MCP server. This is typically handled in a backend API route that processes incoming chat requests. The agent is configured to use the MCP client, enabling it to discover and execute the tools exposed by the server. The crucial step is implementing the logic to handle elicitation events. Within the agent’s tool execution method, a subscription is created to listen for elicitation requests from the MCP server. A basic implementation might involve a simple timeout, where the agent waits for a brief period (e.g., 10 seconds) for a response before automatically rejecting the request. This initial setup confirms that the communication channel is working correctly—a safe command executes immediately, while a dangerous one triggers the pause, demonstrating that the fundamental elicitation mechanism is in place and ready to be connected to a user-facing UI.

4. Overcoming Technical Hurdles with a Custom UI Solution

A significant technical obstacle arises when trying to build a truly interactive UI for elicitation: standard HTTP streaming is unidirectional. Once the agent begins processing a request and streaming its response back to the client, that connection is dedicated to server-to-client communication. The frontend cannot send new information, such as a user’s confirmation, back to the server over the same active connection. The conventional solution in many agent frameworks is to terminate the current execution thread when user input is needed. The frontend collects the input and then initiates an entirely new request to the agent, passing the user’s response as the result of the tool call. However, this approach is incompatible with MCP Elicitation. The MCP server is still active, holding an open Promise and awaiting a response from the backend to resolve it. Terminating the thread would sever the connection to the MCP server, causing its eventual result to be lost and breaking the entire workflow. This limitation necessitates a more sophisticated, out-of-band communication strategy to relay the user’s decision back to the waiting process without interrupting the primary data stream.

The solution lies in creating a custom workaround that bypasses the limitations of the main HTTP stream. When the backend receives an elicitation request from the MCP server, it doesn’t just wait; it forwards the request data through a secondary channel to a generative UI component on the frontend. This component then renders the appropriate interface, such as “Accept” and “Decline” buttons. When the user interacts with this UI, it triggers a completely separate and independent request to a dedicated backend endpoint specifically designed to handle elicitation responses. This endpoint uses a shared, in-memory store, such as a Map that holds references to all active elicitation Promises. Using a unique identifier, the endpoint retrieves the correct waiting Promise and resolves it with the user’s provided action and data. This resolution allows the original MCP execution thread, which has been patiently waiting, to resume its operation seamlessly. For more complex, distributed applications running in edge environments or across multiple servers, this in-memory store would need to be replaced by a more robust external coordination layer, such as Redis with a pub/sub model or a dedicated WebSocket server, but the core principle of out-of-band communication remains the same.

5. A New Paradigm for Agent Interaction

Elicitation fundamentally altered the LLM agent architecture by providing a standardized method for incorporating genuine user interaction directly into the operational flow without disrupting context. By enabling MCP servers to not just respond but to actively query for information, a significant limitation was overcome. This shift from a passive data provider to an active participant in the conversation represented a major step forward in creating more capable and intuitive AI agents. The practical implementation demonstrated that even with limitations in current frontend tooling and protocols, robust solutions were achievable. The successful creation of a confirmation prompt for a shell command, which respected user input to either proceed or halt, validated the entire concept. The development process highlighted the need for careful state management and out-of-band communication channels to bridge the gap between a waiting backend process and an interactive frontend.

The resulting application served as a proof-of-concept for a new class of dynamic, user-aware agent workflows that were previously difficult to build. While broad framework support was still in its early stages, the path forward was made clear. The introduction of Elicitation simplified the development of rich, interactive agent experiences, setting a new standard for human-AI collaboration. This capability moved agent-based systems beyond simple automation and into the realm of true partnership, where the AI can intelligently defer to human judgment when faced with ambiguity or high-stakes decisions. The groundwork was laid for building far more sophisticated, reliable, and user-friendly applications where the human is always in the loop.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later