The Realtime API lets you hold low-latency, bidirectional voice and text conversations with a model over a WebSocket connection. Instead of sending discrete HTTP requests, you open a persistent connection and exchange event messages in both directions — sending audio or text input and receiving streamed audio or text responses as they are generated. Anyone’s Realtime endpoint is compatible with the OpenAI Realtime API format and also supports Azure OpenAI Realtime. You need a channel configured with a provider that supports Realtime (OpenAI or Azure OpenAI) before connecting.Documentation Index
Fetch the complete documentation index at: https://docs.anyone.ai/llms.txt
Use this file to discover all available pages before exploring further.
Realtime requires a channel configured with a Realtime-capable provider (OpenAI or Azure OpenAI). Contact your Anyone administrator if the WebSocket connection is rejected.
Connecting
Endpoint:GET /v1/realtime
Upgrade an HTTP GET request to a WebSocket connection. You can pass your API key as a query parameter or as an Authorization header in the WebSocket handshake.
Authentication
Pass your API key in one of the following ways:- Query parameter:
?token=YOUR_TOKEN - Authorization header:
Authorization: Bearer YOUR_TOKEN(set during the HTTP upgrade handshake)
Event types
Once connected, you exchange JSON event messages. Each message has atype field that identifies its purpose. The following are the core event types.
Client → server
| Event type | Description |
|---|---|
session.update | Configure session parameters such as voice, audio format, tools, and instructions. |
input_audio_buffer.append | Stream base64-encoded audio bytes to the model’s input buffer. |
conversation.item.create | Add a text message to the conversation. |
response.create | Prompt the model to generate a response based on the current conversation and buffer. |
Server → client
| Event type | Description |
|---|---|
session.created | Sent immediately after connecting, confirming the session is ready. |
session.updated | Confirms that a session.update was applied. |
response.audio.delta | A chunk of base64-encoded audio from the model’s response. |
response.audio_transcript.delta | A chunk of the text transcript of the model’s audio output. |
response.function_call_arguments.delta | Streamed function call arguments, when the model calls a tool. |
response.function_call_arguments.done | Signals that function call arguments are complete. |
response.done | Signals that the model has finished generating a response. Contains usage information. |
conversation.item.created | Confirms that a conversation item was added. |
error | An error occurred. Contains an error object with a message and code. |
Session configuration
After connecting, send asession.update event to configure the session:
The interaction modes to enable. For example,
["text", "audio"].System-level instructions that guide the model’s behavior for the session.
The voice to use for audio output. For example,
alloy, echo, nova, or shimmer.Format of the audio you send. For example,
pcm16, g711_ulaw, or g711_alaw.Format of the audio the model returns. For example,
pcm16.Configuration for transcribing your audio input.
Controls how the server detects end-of-turn in audio input. Set to
null to disable automatic turn detection and manage it manually.A list of tool definitions available to the model during this session, following the OpenAI function-calling schema.
Controls when the model uses tools.
auto, none, or a specific tool name.Sampling temperature for the model. Defaults to
0.8.Usage tracking
When the model finishes a response, theresponse.done event includes a usage object:
Total tokens consumed by this response turn.
Tokens in the input (audio + text).
Tokens in the model’s output (audio + text).
Breakdown of input token types (e.g., cached, audio).
Breakdown of output token types (e.g., audio, text).
Example
The following JavaScript example connects to the Realtime endpoint, configures a session, and logs events as they arrive.javascript

