Build your own conversation interface
In this approach, you build a custom frontend on top of Akapulu using the Pipecat + Daily stack. Your UI talks to the realtime layer through PipecatClient, RTVI callbacks and events, and Daily transport. Live audio and video are carried by Daily, while Pipecat/RTVI drives the app-level conversation state your interface reacts to. We recommend this approach when you need custom branding, custom layout, custom controls, or product-specific UX around the live conversation. For a full working implementation, start with the Custom RTVI UI example.Using Pipecat and Daily in your custom UI
Conversation lifecycle review
Your custom UI starts by calling the Akapulu connect endpoint, then using the returnedroom_url and token to join the Daily call with Pipecat.
Who is in the room
Each conversation room has two core participants:- Local user participant: the browser user who joins from your frontend (camera/mic input and local UI controls).
- Bot participant: the Akapulu AI assistant that joins the same room as a remote participant
/conversations/connect/ API:
room_url and token your frontend needs.
It also returns
conversation_session_id, which you can use for updates polling and session tracking.Akapulu then continues by starting the bot runtime and having the bot join the same room.
Render the live video call UI
Afterclient.connect(...) succeeds, you can render the active call UI using the Daily layer from useDaily() and the DailyVideo component.
On macOS, Continuity Camera can be selected automatically as camera or microphone input. If the iPhone is far away or in a pocket, this can lead to a drop in speech recognition quality. If this happens, you can disconnect Continuity Camera for the session, or disable it on iPhone (Settings > General > AirPlay & Handoff > Continuity Camera).
Built-in RTVI events
Pipecat includes built-in RTVI events you can subscribe to in your UI for conversation behavior.For the full list and payload details, see the RTVI callbacks and events docs. Common built-in examples:
RTVIEvent.UserTranscriptfor user transcript updates.RTVIEvent.BotTranscriptfor assistant transcript updates.RTVIEvent.UserStartedSpeakingwhen the user begins speaking.RTVIEvent.UserStoppedSpeakingwhen the user stops speaking.RTVIEvent.BotStartedSpeakingwhen bot speech starts.RTVIEvent.BotStoppedSpeakingwhen bot speech ends.
Akapulu custom ServerMessages
Akapulu also sends custom events throughRTVIEvent.ServerMessage for product-specific UI state and tool activity.
Flow node changed
message.type:"flow-node-changed"message.node: current node/stage id in the conversation flow.
Bot speaking state
message.type:"bot-speaking-state"message.state: bot speaking state (speakingoridle).
RAG tool event
message.type:"RAG"message.function_name: RAG tool name.message.body.query: query sent to the knowledge base.
Vision tool event
message.type:"vision"message.function_name: vision tool name.
HTTP tool event
message.type:"http"message.function_name: HTTP tool name.message.body: HTTP tool request body payload
Use care with sensitive values in HTTP endpoint templates. The HTTP tool event includes the full request body payload in frontend-visible RTVI messages, so avoid placing secrets in body fields. Keep secrets in headers and follow the Templates and Variables guidance.
Subscribe to built-in and custom RTVI events
Build a loading display during startup
The bot and room startup process can take 10-15 seconds, so we recommend your custom UI should include a loading state before entering the live call view. During this phase, we suggest you show users:- a status text that reflects current setup progress
- a progress indicator (for example, a progress bar)
- a clear transition into the live UI once readiness is reached
Shown at 2x speed.
Recording strategy in a custom UI
After you trigger recording, there is a short initialization delay before capture actually begins. We recommend starting recording when loading progress reachescompletion_percent >= 50%.
This timing is usually the best balance: early enough to capture the beginning of the live interaction, but late enough to avoid recording too much startup/idle time.
Completed recordings are available in your Akapulu conversations page at akapulu.com/recordings.
Full outline snippet
Here’s a single snippet showing all the above in a single template fileFor a custom engineered implementation, see our Enterprise plan at akapulu.com/pricing.

