Building natural voice conversations with AI agents requires complex infrastructure and a lot of code from engineering teams. Text-based agent interactions follow a turn-based pattern: a user sends a complete request, waits for the agent to process it, and receives a complete response before continuing. Bi-directional streaming overcomes this barrier by establishing a continuous connection that carries data in both directions simultaneously.
Amazon Bedrock AgentCore Runtime supports bi-directional streaming for real-time, two-way communication between users and AI agents. With this capability, agents can simultaneously listen to user input while generating responses, creating a more natural conversational flow. It is particularly suitable for multimodal interactions such as voice and vision agent conversations. The agent can begin responding while still receiving user input, handle interruptions mid-conversation, and adjust its responses based on real-time feedback.
A bi-directional voice chat agent can conduct spoken conversations with the fluidity of human dialogue so that users can interrupt, clarify, or change subjects naturally. These agents process audio input and output to stream simultaneously while maintaining the state of the conversation. Building this infrastructure requires consistently managing low-latency connections, handling concurrent audio streams, preserving context across exchanges, and scaling multiple conversations. Implementing these capabilities from the ground up requires months of engineering effort and specialized real-time systems expertise. Amazon Bedrock AgentCore Runtime addresses these challenges by providing a secure, serverless, and purpose-built hosting environment to deploy and run AI agents without requiring developers to build and maintain complex streaming infrastructure.
In this post, you will learn the prerequisites for creating bi-directional streaming and WebSocket implementations at the AgentCore runtime. You will also learn how to use Strands Agents to implement a bi-directional streaming solution for voice agents.
AgentCore Runtime Bi-Directional Streaming
Bi-directional streaming uses the WebSocket protocol. WebSockets provide full-duplex communications over a single TCP connection, establishing a persistent channel where data flows continuously in both directions. This protocol has broad client support across browsers, mobile applications, and server environments, making it accessible to diverse implementation scenarios.
When a connection is established, the agent can receive user input as a stream, as well as send response segments to the user. The AgentCore runtime manages the underlying infrastructure that handles connections, message ordering, and maintains conversation state in a bi-directional exchange. This reduces the need for developers to build custom streaming infrastructure or manage the complexities of concurrent data flows. Voice conversations differ from text-based interactions in the expectation of a natural flow. When speaking with a voice agent, users expect the same conversational dynamics they experience with humans: the ability to interrupt when they need to correct themselves, the ability to clarify, or redirect the conversation without awkward interruptions. With bi-directional streaming, it is possible for voice agents to process incoming audio while generating responses, detecting interruptions, and adjusting behavior in real time. The agent maintains the conversation context during these interactions, preserving the conversation thread even when the direction of the conversation changes. This capability also helps voice agents transition from a turn-based system to a responsive conversational partner.
Beyond voice conversations, bi-directional streaming has many interaction patterns. Interactive debugging sessions allow developers to guide agents through problem-solving in real-time, providing feedback as agents explore solutions. Collaborative agents can work with users on shared tasks, receiving continuous input as the work progresses rather than waiting for complete instructions. Multi-modal agents can process streaming video or sensor data while simultaneously providing analysis and recommendations. Async long-running agent operations can process tasks over minutes or hours while streaming incremental results to clients.
WebSocket implementation
To create a WebSocket implementation in the AgentCore runtime, you should follow certain patterns. First, your containers need to implement a WebSocket endpoint on port 8080 /ws Path, which aligns with standard WebSocket server practices. This WebSocket endpoint will enable a single agent container to serve both traditional InvokeAgentRuntime API and new InvokeAgentRuntimeWithWebsocketStream API. Additionally, customers must provide a /humming Last point for health check.
AgentCore supports applications that use the WebSocket language library, bi-directional streaming using WebSockets at runtime. The client must connect to the service endpoint with a WebSocket protocol connection:
You also need to use one of the supported authentication methods (SigV4 headers, SigV4 pre-signed URLs, or OAuth 2.0) and ensure that the agent application implements the WebSocket service contract specified in the HTTP protocol contract.
Strands Bi-Directional Agent: Simplified Voice Agent Development
Amazon Nova Sonic integrates speech understanding and generation into a single model, delivering human-like conversational AI with low latency, leading accuracy, and strong value performance. Its integrated architecture provides expressive speech generation and real-time transcription in one model, which dynamically adapts responses based on input speech prosody, speed, and timing.
Bi-directional streaming is also now available in the AgentCore runtime, you have several ways to show how to host a voice agent: there can be a direct implementation where you need to manage WebSocket connections, parse protocol events, handle audio chunks, and orchestrate async tasks; The second is Strand’s bi-directional agent implementation that removes this complexity and performs these steps automatically.
example implementation
In this post, you should refer to Amazon Bedrock AgentCore Bi-Directional Code that implements bi-directional communication with Amazon Bedrock AgentCore. There are two implementations of the repository: one that uses native amazon nova sonic python Implementation deployed directly to the AgentCore runtime, and a High Level Framework Implementation Using the Strands bi-directional agent for simplified real-time audio conversations.
The following diagram shows a native Amazon Nova Sonic Python WebSocket server running directly on AgentCore. It provides full control over the Nova Sonic protocol with direct event handling for full visibility into session management, audio streaming and response generation.
The Strands bi-directional agent framework for real-time audio conversations with Amazon Nova Sonic provides a high-level abstraction that simplifies bi-directional streaming, automated session management, and tool integration. The code snippet below is an example of this simplification.
This implementation demonstrates the simplicity of Strands: instantiate a model, create an agent with tools and a system prompt, and run it with input/output streams. The framework handles protocol complexity internally.
Following is the agent declaration section in the code:
Instruments are passed directly to the agent’s constructor, and Strands handles function calling orchestration automatically. In short, the basic WebSocket implementation of the same functionality requires about 150 lines of code, while the Strands implementation reduces this to about 20 lines focused on business logic. Developers can focus on defining agent behavior, integrating tools, and crafting system signals rather than managing WebSocket connections, parsing events, handling audio fragments, or orchestrating async tasks. This makes bi-directional streaming accessible to developers without specialized real-time system expertise while maintaining full access to Nova Sonic’s audio conversation capabilities. The Strands bi-directional feature is currently only supported for the Python SDK. If you’re looking for flexibility in your voice agent implementation, the native Amazon Nova Sonic implementation can help you. Also, this can be important for cases where you have many different patterns of communication from agent to model. With Amazon Nova Sonic implementation you will be able to control every step of the process with complete control. The framework approach can provide better control of dependencies, as this is done by the SDK, and provide consistency in the system. The same Strands bi-directional agent code structure works with Nova Sonic, OpenAI Realtime API, and Google Gemini Live developers simply swap out the model implementation while keeping the rest of their code unchanged.
conclusion
The bi-directional streaming capability of the Amazon Bedrock AgentCore runtime transforms how developers can create conversational AI agents. By providing a WebSocket-based real-time communications infrastructure, AgentCore removes the months of engineering effort required to implement a streaming system from scratch. The framework runtime enables developers to deploy a variety of voice agents – from basic protocol implementations using Amazon Nova Sonic to higher-level frameworks like the Strands bi-directional agent – ​​in a single secure, serverless environment.
About the authors
Lana Zhang is a Senior Expert Solutions Architect for Generative AI at AWS within the Worldwide Expert Organization. He specializes in AI/ML with a focus on use cases such as AI voice assistants and multimodal understanding. She works closely with clients in various industries including media and entertainment, gaming, sports, advertising, financial services, and healthcare to help transform their business solutions through AI.
Felipe Fabres is a Senior Expert Solutions Architect for Generative AI at AWS for Startups. He specializes in AI/ML with a focus on agentic systems and the entire process of training/inference. He has a Ph.D. With over 10 years of experience working with software development from monolith to event-driven architectures. In graph theory.
evandro franco is a Senior Data Scientist working at Amazon Web Services. He is part of the global GTM team that helps AWS customers address business challenges related to AI/ML on top of AWS, primarily on Amazon Bedrock AgentCore and Strands Agents. He has over 18 years of experience working with technology ranging from software development, infrastructure, serverless to machine learning. In his spare time, Evandro likes to play with his son, mainly building some fun Lego bricks.