Vision

The vision tool captures a snapshot from the user’s camera feed and adds that visual context to the LLM context during a live conversation. Use it when your assistant needs access to the user’s live video stream.

Create a vision tool

Attach via JSON

Add a function with type: "vision" and define a clear name and description. The function name and description are shown to the LLM as tool metadata, so they should clearly describe when the tool should be called.

{
  "name": "<vision_tool_name>",
  "description": "<description>",
  "type": "vision"
}

For example:

{
  "name": "extract_insurance_card_details",
  "description": "Read the current camera frame and extract insurance card details such as payer name, member ID, and group number.",
  "type": "vision"
}

Attach via UI

Open your scenario and go to the target node.
Click Add function.
In the modal, select the Vision Tool tab.
Enter a semantically meaningful function name and description.

Getting started

Scenarios

Avatars

Conversations

Tools

Create a vision tool

Attach via JSON

Attach via UI

Getting started

Scenarios

Avatars

Conversations

Tools

Documentation Index

​Create a vision tool

​Attach via JSON

​Attach via UI

Create a vision tool

Attach via JSON

Attach via UI