Agents are systems that leverage Gemini models, a set of tools, and reasoning capabilities to perform complex, multi-step tasks and achieve specific goals. Unlike a single model call, an agent can plan, execute a series of actions, interact with external systems, and synthesize information to fulfill a user's request.
CreateAgent
Creates a new Agent (Typed version for SDK).
Request body
The request body contains data with the following structure:
The unique identifier for the agent.
The base agent to extend.
System instruction for the agent.
Agent description for developers to quickly read and understand.
tools AgentTool (optional)
The tools available to the agent.
Possible Types
Polymorphic discriminator: type
CodeExecution
A tool that can be used by the model to execute code.
No description provided.
Always set to "code_execution".
GoogleSearch
A tool that can be used by the model to search Google.
No description provided.
Always set to "google_search".
The types of search grounding to enable.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
UrlContext
A tool that can be used by the model to fetch URL context.
No description provided.
Always set to "url_context".
McpServer
A MCPServer is a server that can be called by the model to perform actions.
No description provided.
Always set to "mcp_server".
The name of the MCPServer.
The full URL for the MCPServer endpoint. Example: "https://api.example.com/mcp"
Optional: Fields for authentication headers, timeouts, etc., if needed.
allowed_tools AllowedTools (optional)
The allowed tools.
Fields
The mode of the tool choice.
Possible values:
-
autoAuto tool choice.
-
anyAny tool choice.
-
noneNo tool choice.
-
validatedValidated tool choice.
The names of the allowed tools.
base_environment EnvironmentConfig (optional)
The environment configuration for the agent.
Fields
No description provided.
Always set to "remote".
sources Source (optional)
No description provided.
Fields
No description provided.
Possible values:
-
gcsA GCS bucket.
-
inlineInline content.
-
repositoryA generic repository. The protocol prefix in the source URL identifies the provider (e.g., github://, gcs://).
-
skill_registryA skill resource from the Skill Registry Service. Skill: projects/{project}/locations/{location}/skills/{skill} SkillRevision: projects/{project}/locations/{location}/skills/{skill}/revisions/{revision} Support mounting all skills under a project: projects/{project}/locations/{location}/skills.
The source of the environment. For GCS, this is the GCS path. For GitHub, this is the GitHub path.
Where the source should appear in the environment.
The inline content if `type` is `INLINE`.
Optional encoding for inline content (e.g. `base64`).
network EnvironmentNetworkEgressAllowlist (optional)
Network configuration for the environment.
Possible Types
object
Outbound networking configuration for the sandbox. When specified, restricts which external domains the sandbox can reach. Omit entirely to allow all outbound traffic with no header injection.
allowlist AllowlistEntry (optional)
List of allowed outbound domains. Only requests to listed domains are permitted. Use [{'domain': '*'}] to allow all domains while still injecting headers on specific ones.
Fields
Domain to allow outbound requests to. Supports wildcards (e.g. '*.googleapis.com'). Use '*' to allow all domains.
Headers to inject on all outbound requests matching this domain. Each entry is a flat {header_name: header_value} object. The egress proxy injects these automatically.
string
Turns all network off.
Possible values
-
disabledTurns all network off.
Response
If successful, the response body contains data with the following structure:
The unique identifier for the agent.
The base agent to extend.
System instruction for the agent.
Agent description for developers to quickly read and understand.
tools AgentTool (optional)
The tools available to the agent.
Possible Types
Polymorphic discriminator: type
CodeExecution
A tool that can be used by the model to execute code.
No description provided.
Always set to "code_execution".
GoogleSearch
A tool that can be used by the model to search Google.
No description provided.
Always set to "google_search".
The types of search grounding to enable.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
UrlContext
A tool that can be used by the model to fetch URL context.
No description provided.
Always set to "url_context".
McpServer
A MCPServer is a server that can be called by the model to perform actions.
No description provided.
Always set to "mcp_server".
The name of the MCPServer.
The full URL for the MCPServer endpoint. Example: "https://api.example.com/mcp"
Optional: Fields for authentication headers, timeouts, etc., if needed.
allowed_tools AllowedTools (optional)
The allowed tools.
Fields
The mode of the tool choice.
Possible values:
-
autoAuto tool choice.
-
anyAny tool choice.
-
noneNo tool choice.
-
validatedValidated tool choice.
The names of the allowed tools.
base_environment EnvironmentConfig (optional)
The environment configuration for the agent.
Fields
No description provided.
Always set to "remote".
sources Source (optional)
No description provided.
Fields
No description provided.
Possible values:
-
gcsA GCS bucket.
-
inlineInline content.
-
repositoryA generic repository. The protocol prefix in the source URL identifies the provider (e.g., github://, gcs://).
-
skill_registryA skill resource from the Skill Registry Service. Skill: projects/{project}/locations/{location}/skills/{skill} SkillRevision: projects/{project}/locations/{location}/skills/{skill}/revisions/{revision} Support mounting all skills under a project: projects/{project}/locations/{location}/skills.
The source of the environment. For GCS, this is the GCS path. For GitHub, this is the GitHub path.
Where the source should appear in the environment.
The inline content if `type` is `INLINE`.
Optional encoding for inline content (e.g. `base64`).
network EnvironmentNetworkEgressAllowlist (optional)
Network configuration for the environment.
Possible Types
object
Outbound networking configuration for the sandbox. When specified, restricts which external domains the sandbox can reach. Omit entirely to allow all outbound traffic with no header injection.
allowlist AllowlistEntry (optional)
List of allowed outbound domains. Only requests to listed domains are permitted. Use [{'domain': '*'}] to allow all domains while still injecting headers on specific ones.
Fields
Domain to allow outbound requests to. Supports wildcards (e.g. '*.googleapis.com'). Use '*' to allow all domains.
Headers to inject on all outbound requests matching this domain. Each entry is a flat {header_name: header_value} object. The egress proxy injects these automatically.
string
Turns all network off.
Possible values
-
disabledTurns all network off.
Create Agent
Example Response
{ "id": "ag_abc123", "display_name": "My Research Agent", "system_instruction": "You are a helpful research assistant.", "tools": [ { "type": "google_search" } ], "object": "agent", "created": "2025-11-26T12:25:15Z", "updated": "2025-11-26T12:25:15Z" }
Agent with Sources
Example Response
{ "id": "data-analyst-abc123", "system_instruction": "You are a data analyst. Always include visualizations and export results as PDF.", "object": "agent", "created": "2025-11-26T12:25:15Z", "updated": "2025-11-26T12:25:15Z" }
Agent Forked from Environment
Example Response
{ "id": "my-data-analyst", "system_instruction": "You are a data analyst. Use the template at /workspace/template.py for all reports.", "object": "agent", "created": "2025-11-26T12:25:15Z", "updated": "2025-11-26T12:25:15Z" }
ListAgents
Lists all Agents.
Path / Query Parameters
No description provided.
No description provided.
No description provided.
Response
If successful, the response body contains data with the following structure:
No description provided.
No description provided.
List Agents
Example Response
{ "object": "list", "data": [ { "id": "ag_abc123", "display_name": "My Research Agent", "system_instruction": "You are a helpful research assistant.", "object": "agent", "created": "2025-11-26T12:25:15Z", "updated": "2025-11-26T12:25:15Z" } ] }
GetAgent
Gets a specific Agent.
Path / Query Parameters
No description provided.
Response
If successful, the response body contains data with the following structure:
The unique identifier for the agent.
The base agent to extend.
System instruction for the agent.
Agent description for developers to quickly read and understand.
tools AgentTool (optional)
The tools available to the agent.
Possible Types
Polymorphic discriminator: type
CodeExecution
A tool that can be used by the model to execute code.
No description provided.
Always set to "code_execution".
GoogleSearch
A tool that can be used by the model to search Google.
No description provided.
Always set to "google_search".
The types of search grounding to enable.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
UrlContext
A tool that can be used by the model to fetch URL context.
No description provided.
Always set to "url_context".
McpServer
A MCPServer is a server that can be called by the model to perform actions.
No description provided.
Always set to "mcp_server".
The name of the MCPServer.
The full URL for the MCPServer endpoint. Example: "https://api.example.com/mcp"
Optional: Fields for authentication headers, timeouts, etc., if needed.
allowed_tools AllowedTools (optional)
The allowed tools.
Fields
The mode of the tool choice.
Possible values:
-
autoAuto tool choice.
-
anyAny tool choice.
-
noneNo tool choice.
-
validatedValidated tool choice.
The names of the allowed tools.
base_environment EnvironmentConfig (optional)
The environment configuration for the agent.
Fields
No description provided.
Always set to "remote".
sources Source (optional)
No description provided.
Fields
No description provided.
Possible values:
-
gcsA GCS bucket.
-
inlineInline content.
-
repositoryA generic repository. The protocol prefix in the source URL identifies the provider (e.g., github://, gcs://).
-
skill_registryA skill resource from the Skill Registry Service. Skill: projects/{project}/locations/{location}/skills/{skill} SkillRevision: projects/{project}/locations/{location}/skills/{skill}/revisions/{revision} Support mounting all skills under a project: projects/{project}/locations/{location}/skills.
The source of the environment. For GCS, this is the GCS path. For GitHub, this is the GitHub path.
Where the source should appear in the environment.
The inline content if `type` is `INLINE`.
Optional encoding for inline content (e.g. `base64`).
network EnvironmentNetworkEgressAllowlist (optional)
Network configuration for the environment.
Possible Types
object
Outbound networking configuration for the sandbox. When specified, restricts which external domains the sandbox can reach. Omit entirely to allow all outbound traffic with no header injection.
allowlist AllowlistEntry (optional)
List of allowed outbound domains. Only requests to listed domains are permitted. Use [{'domain': '*'}] to allow all domains while still injecting headers on specific ones.
Fields
Domain to allow outbound requests to. Supports wildcards (e.g. '*.googleapis.com'). Use '*' to allow all domains.
Headers to inject on all outbound requests matching this domain. Each entry is a flat {header_name: header_value} object. The egress proxy injects these automatically.
string
Turns all network off.
Possible values
-
disabledTurns all network off.
Get Agent
Example Response
{ "id": "ag_abc123", "display_name": "My Research Agent", "system_instruction": "You are a helpful research assistant.", "tools": [ { "type": "google_search" } ], "object": "agent", "created": "2025-11-26T12:25:15Z", "updated": "2025-11-26T12:25:15Z" }
DeleteAgent
Deletes an Agent.
Path / Query Parameters
No description provided.
Response
If successful, the response is empty.
Delete Agent
Resources
Agent
An agent definition for the CreateAgent API. This message is the target for annotation-parser-based JSON parsing. New format: { "id": "customer-sentinel", "base_agent": "", "system_instruction": "...", "base_environment": { "type": "remote", "sources": [...] }, "tools": [ {"type": "code_execution"} ] }
Fields
The unique identifier for the agent.
The base agent to extend.
System instruction for the agent.
Agent description for developers to quickly read and understand.
tools AgentTool (optional)
The tools available to the agent.
Possible Types
Polymorphic discriminator: type
CodeExecution
A tool that can be used by the model to execute code.
No description provided.
Always set to "code_execution".
GoogleSearch
A tool that can be used by the model to search Google.
No description provided.
Always set to "google_search".
The types of search grounding to enable.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
UrlContext
A tool that can be used by the model to fetch URL context.
No description provided.
Always set to "url_context".
McpServer
A MCPServer is a server that can be called by the model to perform actions.
No description provided.
Always set to "mcp_server".
The name of the MCPServer.
The full URL for the MCPServer endpoint. Example: "https://api.example.com/mcp"
Optional: Fields for authentication headers, timeouts, etc., if needed.
allowed_tools AllowedTools (optional)
The allowed tools.
Fields
The mode of the tool choice.
Possible values:
-
autoAuto tool choice.
-
anyAny tool choice.
-
noneNo tool choice.
-
validatedValidated tool choice.
The names of the allowed tools.
base_environment EnvironmentConfig (optional)
The environment configuration for the agent.
Fields
No description provided.
Always set to "remote".
sources Source (optional)
No description provided.
Fields
No description provided.
Possible values:
-
gcsA GCS bucket.
-
inlineInline content.
-
repositoryA generic repository. The protocol prefix in the source URL identifies the provider (e.g., github://, gcs://).
-
skill_registryA skill resource from the Skill Registry Service. Skill: projects/{project}/locations/{location}/skills/{skill} SkillRevision: projects/{project}/locations/{location}/skills/{skill}/revisions/{revision} Support mounting all skills under a project: projects/{project}/locations/{location}/skills.
The source of the environment. For GCS, this is the GCS path. For GitHub, this is the GitHub path.
Where the source should appear in the environment.
The inline content if `type` is `INLINE`.
Optional encoding for inline content (e.g. `base64`).
network EnvironmentNetworkEgressAllowlist (optional)
Network configuration for the environment.
Possible Types
object
Outbound networking configuration for the sandbox. When specified, restricts which external domains the sandbox can reach. Omit entirely to allow all outbound traffic with no header injection.
allowlist AllowlistEntry (optional)
List of allowed outbound domains. Only requests to listed domains are permitted. Use [{'domain': '*'}] to allow all domains while still injecting headers on specific ones.
Fields
Domain to allow outbound requests to. Supports wildcards (e.g. '*.googleapis.com'). Use '*' to allow all domains.
Headers to inject on all outbound requests matching this domain. Each entry is a flat {header_name: header_value} object. The egress proxy injects these automatically.
string
Turns all network off.
Possible values
-
disabledTurns all network off.
Data Models
InteractionSseEvent
Possible Types
Polymorphic discriminator: event_type
InteractionCreatedEvent
No description provided.
Always set to "interaction.created".
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
interaction InteractionSseEventInteraction (required)
Partial interaction resource emitted when the stream is created.
Fields
Required. Output only. A unique identifier for the interaction completion.
Output only. The resource type.
The model that will complete your prompt.
The agent to interact with.
Required. Output only. The status of the interaction.
Possible values:
-
in_progressThe interaction is in progress.
-
requires_actionThe interaction requires action/input from the user.
-
completedThe interaction is completed.
-
failedThe interaction failed.
-
cancelledThe interaction was cancelled.
-
incompleteThe interaction is completed, but contains incomplete results (e.g. hitting max_tokens).
Output only. The time at which the response was created in ISO 8601 format.
Output only. The time at which the response was last updated in ISO 8601 format.
service_tier ServiceTier (optional)
The service tier for the interaction.
Possible values
-
flexFlex service tier.
-
standardStandard service tier.
-
priorityPriority service tier.
usage Usage (optional)
Output only. Statistics on the interaction request's token usage.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
steps Step (optional)
Output only. The steps that make up the interaction, if included in this event.
Possible Types
Polymorphic discriminator: type
UserInputStep
Input provided by the user.
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
No description provided.
Always set to "user_input".
ModelOutputStep
Output generated by the model.
No description provided.
Always set to "model_output".
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
ThoughtStep
A thought step.
No description provided.
Always set to "thought".
A signature hash for backend validation.
summary ThoughtSummaryContent (optional)
A summary of the thought.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
FunctionCallStep
A function tool call step.
No description provided.
Always set to "function_call".
Required. The name of the tool to call.
Required. The arguments to pass to the function.
Required. A unique ID for this specific tool call.
CodeExecutionCallStep
Code execution call step.
No description provided.
Always set to "code_execution_call".
arguments CodeExecutionCallStepArguments (required)
Required. The arguments to pass to the code execution.
Fields
Programming language of the `code`.
Possible values:
-
pythonPython >= 3.10, with numpy and simpy available.
The code to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
UrlContextCallStep
URL context call step.
No description provided.
Always set to "url_context_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
arguments UrlContextCallArguments (required)
The arguments to pass to the URL context.
Fields
The URLs to fetch.
McpServerToolCallStep
MCPServer tool call step.
No description provided.
Always set to "mcp_server_tool_call".
Required. The name of the tool which was called.
Required. The name of the used MCP server.
Required. The JSON object of arguments for the function.
Required. A unique ID for this specific tool call.
GoogleSearchCallStep
Google Search call step.
No description provided.
Always set to "google_search_call".
arguments GoogleSearchCallStepArguments (required)
Required. The arguments to pass to Google Search.
Fields
Web search queries for the following-up web search.
The type of search grounding enabled.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FileSearchCallStep
File Search call step.
No description provided.
Always set to "file_search_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
GoogleMapsCallStep
Google Maps call step.
No description provided.
Always set to "google_maps_call".
arguments GoogleMapsCallStepArguments (optional)
The arguments to pass to the Google Maps tool.
Fields
The queries to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FunctionResultStep
Result of a function tool call.
No description provided.
Always set to "function_result".
The name of the tool that was called.
Whether the tool call resulted in an error.
Required. ID to match the ID from the function call block.
The result of the tool call.
CodeExecutionResultStep
Code execution result step.
No description provided.
Always set to "code_execution_result".
Required. The output of the code execution.
Whether the code execution resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
UrlContextResultStep
URL context result step.
No description provided.
Always set to "url_context_result".
result UrlContextResult (required)
Required. The results of the URL context.
Fields
The URL that was fetched.
The status of the URL retrieval.
Possible values:
-
successUrl retrieval is successful.
-
errorUrl retrieval is failed due to error.
-
paywallUrl retrieval is failed because the content is behind paywall.
-
unsafeUrl retrieval is failed because the content is unsafe.
Whether the URL context resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleSearchResultStep
Google Search result step.
No description provided.
Always set to "google_search_result".
result GoogleSearchResultItem (required)
Required. The results of the Google Search.
Fields
Web content snippet that can be embedded in a web page or an app webview.
Whether the Google Search resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
McpServerToolResultStep
MCPServer tool result step.
No description provided.
Always set to "mcp_server_tool_result".
Name of the tool which is called for this specific tool call.
The name of the used MCP server.
Required. ID to match the ID from the function call block.
The output from the MCP server call. Can be simple text or rich content.
FileSearchResultStep
File Search result step.
No description provided.
Always set to "file_search_result".
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleMapsResultStep
Google Maps result step.
No description provided.
Always set to "google_maps_result".
result GoogleMapsResultItem (required)
No description provided.
Fields
places GoogleMapsResultPlaces (optional)
No description provided.
Fields
No description provided.
No description provided.
No description provided.
review_snippets ReviewSnippet (optional)
No description provided.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
No description provided.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
InteractionCompletedEvent
No description provided.
Always set to "interaction.completed".
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
interaction InteractionSseEventInteraction (required)
Partial completed interaction resource emitted at the end of the stream.
Fields
Required. Output only. A unique identifier for the interaction completion.
Output only. The resource type.
The model that will complete your prompt.
The agent to interact with.
Required. Output only. The status of the interaction.
Possible values:
-
in_progressThe interaction is in progress.
-
requires_actionThe interaction requires action/input from the user.
-
completedThe interaction is completed.
-
failedThe interaction failed.
-
cancelledThe interaction was cancelled.
-
incompleteThe interaction is completed, but contains incomplete results (e.g. hitting max_tokens).
Output only. The time at which the response was created in ISO 8601 format.
Output only. The time at which the response was last updated in ISO 8601 format.
service_tier ServiceTier (optional)
The service tier for the interaction.
Possible values
-
flexFlex service tier.
-
standardStandard service tier.
-
priorityPriority service tier.
usage Usage (optional)
Output only. Statistics on the interaction request's token usage.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
steps Step (optional)
Output only. The steps that make up the interaction, if included in this event.
Possible Types
Polymorphic discriminator: type
UserInputStep
Input provided by the user.
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
No description provided.
Always set to "user_input".
ModelOutputStep
Output generated by the model.
No description provided.
Always set to "model_output".
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
ThoughtStep
A thought step.
No description provided.
Always set to "thought".
A signature hash for backend validation.
summary ThoughtSummaryContent (optional)
A summary of the thought.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
FunctionCallStep
A function tool call step.
No description provided.
Always set to "function_call".
Required. The name of the tool to call.
Required. The arguments to pass to the function.
Required. A unique ID for this specific tool call.
CodeExecutionCallStep
Code execution call step.
No description provided.
Always set to "code_execution_call".
arguments CodeExecutionCallStepArguments (required)
Required. The arguments to pass to the code execution.
Fields
Programming language of the `code`.
Possible values:
-
pythonPython >= 3.10, with numpy and simpy available.
The code to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
UrlContextCallStep
URL context call step.
No description provided.
Always set to "url_context_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
arguments UrlContextCallArguments (required)
The arguments to pass to the URL context.
Fields
The URLs to fetch.
McpServerToolCallStep
MCPServer tool call step.
No description provided.
Always set to "mcp_server_tool_call".
Required. The name of the tool which was called.
Required. The name of the used MCP server.
Required. The JSON object of arguments for the function.
Required. A unique ID for this specific tool call.
GoogleSearchCallStep
Google Search call step.
No description provided.
Always set to "google_search_call".
arguments GoogleSearchCallStepArguments (required)
Required. The arguments to pass to Google Search.
Fields
Web search queries for the following-up web search.
The type of search grounding enabled.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FileSearchCallStep
File Search call step.
No description provided.
Always set to "file_search_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
GoogleMapsCallStep
Google Maps call step.
No description provided.
Always set to "google_maps_call".
arguments GoogleMapsCallStepArguments (optional)
The arguments to pass to the Google Maps tool.
Fields
The queries to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FunctionResultStep
Result of a function tool call.
No description provided.
Always set to "function_result".
The name of the tool that was called.
Whether the tool call resulted in an error.
Required. ID to match the ID from the function call block.
The result of the tool call.
CodeExecutionResultStep
Code execution result step.
No description provided.
Always set to "code_execution_result".
Required. The output of the code execution.
Whether the code execution resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
UrlContextResultStep
URL context result step.
No description provided.
Always set to "url_context_result".
result UrlContextResult (required)
Required. The results of the URL context.
Fields
The URL that was fetched.
The status of the URL retrieval.
Possible values:
-
successUrl retrieval is successful.
-
errorUrl retrieval is failed due to error.
-
paywallUrl retrieval is failed because the content is behind paywall.
-
unsafeUrl retrieval is failed because the content is unsafe.
Whether the URL context resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleSearchResultStep
Google Search result step.
No description provided.
Always set to "google_search_result".
result GoogleSearchResultItem (required)
Required. The results of the Google Search.
Fields
Web content snippet that can be embedded in a web page or an app webview.
Whether the Google Search resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
McpServerToolResultStep
MCPServer tool result step.
No description provided.
Always set to "mcp_server_tool_result".
Name of the tool which is called for this specific tool call.
The name of the used MCP server.
Required. ID to match the ID from the function call block.
The output from the MCP server call. Can be simple text or rich content.
FileSearchResultStep
File Search result step.
No description provided.
Always set to "file_search_result".
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleMapsResultStep
Google Maps result step.
No description provided.
Always set to "google_maps_result".
result GoogleMapsResultItem (required)
No description provided.
Fields
places GoogleMapsResultPlaces (optional)
No description provided.
Fields
No description provided.
No description provided.
No description provided.
review_snippets ReviewSnippet (optional)
No description provided.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
No description provided.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
InteractionStatusUpdate
No description provided.
Always set to "interaction.status_update".
No description provided.
No description provided.
Possible values:
-
in_progressThe interaction is in progress.
-
requires_actionThe interaction requires action/input from the user.
-
completedThe interaction is completed.
-
failedThe interaction failed.
-
cancelledThe interaction was cancelled.
-
incompleteThe interaction is completed, but contains incomplete results (e.g. hitting max_tokens).
-
budget_exceededThe interaction was halted because the token budget was exceeded.
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
ErrorEvent
No description provided.
Always set to "error".
error Error (optional)
No description provided.
Fields
A URI that identifies the error type.
A human-readable error message.
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
StepStart
No description provided.
Always set to "step.start".
No description provided.
step Step (required)
No description provided.
Possible Types
Polymorphic discriminator: type
UserInputStep
Input provided by the user.
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
No description provided.
Always set to "user_input".
ModelOutputStep
Output generated by the model.
No description provided.
Always set to "model_output".
content Content (optional)
No description provided.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
ThoughtStep
A thought step.
No description provided.
Always set to "thought".
A signature hash for backend validation.
summary ThoughtSummaryContent (optional)
A summary of the thought.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
FunctionCallStep
A function tool call step.
No description provided.
Always set to "function_call".
Required. The name of the tool to call.
Required. The arguments to pass to the function.
Required. A unique ID for this specific tool call.
CodeExecutionCallStep
Code execution call step.
No description provided.
Always set to "code_execution_call".
arguments CodeExecutionCallStepArguments (required)
Required. The arguments to pass to the code execution.
Fields
Programming language of the `code`.
Possible values:
-
pythonPython >= 3.10, with numpy and simpy available.
The code to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
UrlContextCallStep
URL context call step.
No description provided.
Always set to "url_context_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
arguments UrlContextCallArguments (required)
The arguments to pass to the URL context.
Fields
The URLs to fetch.
McpServerToolCallStep
MCPServer tool call step.
No description provided.
Always set to "mcp_server_tool_call".
Required. The name of the tool which was called.
Required. The name of the used MCP server.
Required. The JSON object of arguments for the function.
Required. A unique ID for this specific tool call.
GoogleSearchCallStep
Google Search call step.
No description provided.
Always set to "google_search_call".
arguments GoogleSearchCallStepArguments (required)
Required. The arguments to pass to Google Search.
Fields
Web search queries for the following-up web search.
The type of search grounding enabled.
Possible values:
-
web_searchSetting this field enables web search. Only text results are returned.
-
image_searchSetting this field enables image search. Image bytes are returned.
-
enterprise_web_searchSetting this field enables enterprise web search.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FileSearchCallStep
File Search call step.
No description provided.
Always set to "file_search_call".
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
GoogleMapsCallStep
Google Maps call step.
No description provided.
Always set to "google_maps_call".
arguments GoogleMapsCallStepArguments (optional)
The arguments to pass to the Google Maps tool.
Fields
The queries to be executed.
Required. A unique ID for this specific tool call.
A signature hash for backend validation.
FunctionResultStep
Result of a function tool call.
No description provided.
Always set to "function_result".
The name of the tool that was called.
Whether the tool call resulted in an error.
Required. ID to match the ID from the function call block.
The result of the tool call.
CodeExecutionResultStep
Code execution result step.
No description provided.
Always set to "code_execution_result".
Required. The output of the code execution.
Whether the code execution resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
UrlContextResultStep
URL context result step.
No description provided.
Always set to "url_context_result".
result UrlContextResult (required)
Required. The results of the URL context.
Fields
The URL that was fetched.
The status of the URL retrieval.
Possible values:
-
successUrl retrieval is successful.
-
errorUrl retrieval is failed due to error.
-
paywallUrl retrieval is failed because the content is behind paywall.
-
unsafeUrl retrieval is failed because the content is unsafe.
Whether the URL context resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleSearchResultStep
Google Search result step.
No description provided.
Always set to "google_search_result".
result GoogleSearchResultItem (required)
Required. The results of the Google Search.
Fields
Web content snippet that can be embedded in a web page or an app webview.
Whether the Google Search resulted in an error.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
McpServerToolResultStep
MCPServer tool result step.
No description provided.
Always set to "mcp_server_tool_result".
Name of the tool which is called for this specific tool call.
The name of the used MCP server.
Required. ID to match the ID from the function call block.
The output from the MCP server call. Can be simple text or rich content.
FileSearchResultStep
File Search result step.
No description provided.
Always set to "file_search_result".
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
GoogleMapsResultStep
Google Maps result step.
No description provided.
Always set to "google_maps_result".
result GoogleMapsResultItem (required)
No description provided.
Fields
places GoogleMapsResultPlaces (optional)
No description provided.
Fields
No description provided.
No description provided.
No description provided.
review_snippets ReviewSnippet (optional)
No description provided.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
No description provided.
Required. ID to match the ID from the function call block.
A signature hash for backend validation.
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
StepDelta
No description provided.
Always set to "step.delta".
No description provided.
delta StepDeltaData (required)
No description provided.
Possible Types
Polymorphic discriminator: type
TextDelta
No description provided.
Always set to "text".
No description provided.
ImageDelta
No description provided.
Always set to "image".
No description provided.
No description provided.
No description provided.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioDelta
No description provided.
Always set to "audio".
No description provided.
No description provided.
No description provided.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The sample rate of the audio.
The number of audio channels.
DocumentDelta
No description provided.
Always set to "document".
No description provided.
No description provided.
No description provided.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoDelta
No description provided.
Always set to "video".
No description provided.
No description provided.
No description provided.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
ThoughtSummaryDelta
No description provided.
Always set to "thought_summary".
content Content (optional)
A new summary item to be added to the thought.
Possible Types
Polymorphic discriminator: type
TextContent
A text content block.
No description provided.
Always set to "text".
Required. The text content.
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ImageContent
An image content block.
No description provided.
Always set to "image".
The image content.
The URI of the image.
The mime type of the image.
Possible values:
-
image/pngPNG image format
-
image/jpegJPEG image format
-
image/webpWebP image format
-
image/heicHEIC image format
-
image/heifHEIF image format
-
image/gifGIF image format
-
image/bmpBMP image format
-
image/tiffTIFF image format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
AudioContent
An audio content block.
No description provided.
Always set to "audio".
The audio content.
The URI of the audio.
The mime type of the audio.
Possible values:
-
audio/wavWAV audio format
-
audio/mp3MP3 audio format
-
audio/aiffAIFF audio format
-
audio/aacAAC audio format
-
audio/oggOGG audio format
-
audio/flacFLAC audio format
-
audio/mpegMPEG audio format
-
audio/m4aM4A audio format
-
audio/l16L16 audio format
-
audio/opusOPUS audio format
-
audio/alawALAW audio format
-
audio/mulawMULAW audio format
The number of audio channels.
The sample rate of the audio.
DocumentContent
A document content block.
No description provided.
Always set to "document".
The document content.
The URI of the document.
The mime type of the document.
Possible values:
-
application/pdfPDF document format
-
text/csvCSV document format
VideoContent
A video content block.
No description provided.
Always set to "video".
The video content.
The URI of the video.
The mime type of the video.
Possible values:
-
video/mp4MP4 video format
-
video/mpegMPEG video format
-
video/mpgMPG video format
-
video/movMOV video format
-
video/aviAVI video format
-
video/x-flvFLV video format
-
video/webmWebM video format
-
video/wmvWMV video format
-
video/3gpp3GPP video format
resolution MediaResolution (optional)
The resolution of the media.
Possible values
-
lowLow resolution.
-
mediumMedium resolution.
-
highHigh resolution.
-
ultra_highUltra high resolution.
ThoughtSignatureDelta
No description provided.
Always set to "thought_signature".
Signature to match the backend source to be part of the generation.
TextAnnotationDelta
No description provided.
Always set to "text_annotation_delta".
annotations Annotation (optional)
Citation information for model-generated content.
Possible Types
Polymorphic discriminator: type
UrlCitation
A URL citation annotation.
No description provided.
Always set to "url_citation".
The URL.
The title of the URL.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
FileCitation
A file citation annotation.
No description provided.
Always set to "file_citation".
The URI of the file.
The name of the file.
Source attributed for a portion of the text.
User provided metadata about the retrieved context.
Page number of the cited document, if applicable.
Media ID in-case of image citations, if applicable.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
PlaceCitation
A place citation annotation.
No description provided.
Always set to "place_citation".
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
End of the attributed segment, exclusive.
ArgumentsDelta
No description provided.
Always set to "arguments_delta".
No description provided.
CodeExecutionCallDelta
No description provided.
Always set to "code_execution_call".
arguments CodeExecutionCallArguments (required)
No description provided.
Fields
Programming language of the `code`.
Possible values:
-
pythonPython >= 3.10, with numpy and simpy available.
The code to be executed.
A signature hash for backend validation.
UrlContextCallDelta
No description provided.
Always set to "url_context_call".
arguments UrlContextCallArguments (required)
No description provided.
Fields
The URLs to fetch.
A signature hash for backend validation.
GoogleSearchCallDelta
No description provided.
Always set to "google_search_call".
arguments GoogleSearchCallArguments (required)
No description provided.
Fields
Web search queries for the following-up web search.
A signature hash for backend validation.
McpServerToolCallDelta
No description provided.
Always set to "mcp_server_tool_call".
No description provided.
No description provided.
No description provided.
FileSearchCallDelta
No description provided.
Always set to "file_search_call".
A signature hash for backend validation.
GoogleMapsCallDelta
No description provided.
Always set to "google_maps_call".
arguments GoogleMapsCallArguments (optional)
The arguments to pass to the Google Maps tool.
Fields
The queries to be executed.
A signature hash for backend validation.
CodeExecutionResultDelta
No description provided.
Always set to "code_execution_result".
No description provided.
No description provided.
A signature hash for backend validation.
UrlContextResultDelta
No description provided.
Always set to "url_context_result".
result UrlContextResult (required)
No description provided.
Fields
The URL that was fetched.
The status of the URL retrieval.
Possible values:
-
successUrl retrieval is successful.
-
errorUrl retrieval is failed due to error.
-
paywallUrl retrieval is failed because the content is behind paywall.
-
unsafeUrl retrieval is failed because the content is unsafe.
No description provided.
A signature hash for backend validation.
GoogleSearchResultDelta
No description provided.
Always set to "google_search_result".
result GoogleSearchResult (required)
No description provided.
Fields
Web content snippet that can be embedded in a web page or an app webview.
No description provided.
A signature hash for backend validation.
McpServerToolResultDelta
No description provided.
Always set to "mcp_server_tool_result".
No description provided.
No description provided.
No description provided.
FileSearchResultDelta
No description provided.
Always set to "file_search_result".
result FileSearchResult (required)
No description provided.
A signature hash for backend validation.
GoogleMapsResultDelta
No description provided.
Always set to "google_maps_result".
result GoogleMapsResult (optional)
The results of the Google Maps.
Fields
places Places (optional)
The places that were found.
Fields
The ID of the place, in `places/{place_id}` format.
Title of the place.
URI reference of the place.
review_snippets ReviewSnippet (optional)
Snippets of reviews that are used to generate answers about the features of a given place in Google Maps.
Fields
Title of the review.
A link that corresponds to the user review on Google Maps.
The ID of the review snippet.
Resource name of the Google Maps widget context token.
A signature hash for backend validation.
FunctionResultDelta
No description provided.
Always set to "function_result".
No description provided.
No description provided.
Required. ID to match the ID from the function call block.
No description provided.
The event_id token to be used to resume the interaction stream, from this event.
metadata StepDeltaMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
Statistics on the interaction request's token usage.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
StepStop
No description provided.
Always set to "step.stop".
No description provided.
The event_id token to be used to resume the interaction stream, from this event.
metadata StreamMetadata (optional)
Optional metadata accompanying ANY streamed event.
Fields
total_usage Usage (optional)
No description provided.
Fields
Number of tokens in the prompt (context).
input_tokens_by_modality ModalityTokens (optional)
A breakdown of input token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens in the cached part of the prompt (the cached content).
cached_tokens_by_modality ModalityTokens (optional)
A breakdown of cached token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Total number of tokens across all the generated responses.
output_tokens_by_modality ModalityTokens (optional)
A breakdown of output token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens present in tool-use prompt(s).
tool_use_tokens_by_modality ModalityTokens (optional)
A breakdown of tool-use token usage by modality.
Fields
modality ResponseModality (optional)
The modality associated with the token count.
Possible values
-
textIndicates the model should return text.
-
imageIndicates the model should return images.
-
audioIndicates the model should return audio.
-
videoIndicates the model should return video.
-
documentIndicates the model should return documents.
Number of tokens for the modality.
Number of tokens of thoughts for thinking models.
Total token count for the interaction request (prompt + responses + other internal tokens).
grounding_tool_count GroundingToolCount (optional)
Grounding tool count.
Fields
The grounding tool type associated with the count.
Possible values:
-
google_searchGrounding with Google Web Search and Image Search, & Web Grounding for Enterprise.
-
google_mapsGrounding with Google Maps.
-
retrievalGrounding with customer's data, for example, VertexAISearch.
The number of grounding tool counts.
Examples
Interaction Created
{ "event_type": "interaction.created", "interaction": { "id": "v1_ChdXS0l4YWZXTk9xbk0xZThQczhEcmlROBIXV0tJeGFmV05PcW5NMWU4UHM4RHJpUTg", "model": "gemini-3.5-flash", "status": "in_progress", "created": "2025-12-04T15:01:45Z", "updated": "2025-12-04T15:01:45Z" }, "event_id": "evt_123" }
Interaction Created
{ "event_type": "interaction.created", "interaction": { "id": "v1_ChdXS0l4YWZXTk9xbk0xZThQczhEcmlROBIXV0tJeGFmV05PcW5NMWU4UHM4RHJpUTg", "model": "gemini-3-flash-preview", "object": "interaction", "status": "in_progress" }, "event_id": "evt_123" }
Interaction Completed
{ "event_type": "interaction.completed", "interaction": { "id": "v1_ChdXS0l4YWZXTk9xbk0xZThQczhEcmlROBIXV0tJeGFmV05PcW5NMWU4UHM4RHJpUTg", "model": "gemini-3.5-flash", "status": "completed", "created": "2025-12-04T15:01:45Z", "updated": "2025-12-04T15:01:45Z" }, "event_id": "evt_123" }
Interaction Completed
{ "event_type": "interaction.completed", "interaction": { "id": "v1_ChdXS0l4YWZXTk9xbk0xZThQczhEcmlROBIXV0tJeGFmV05PcW5NMWU4UHM4RHJpUTg", "model": "gemini-3-flash-preview", "object": "interaction", "status": "completed", "created": "2025-12-04T15:01:45Z", "updated": "2025-12-04T15:01:45Z" }, "event_id": "evt_123" }
Interaction Status Update
{ "event_type": "interaction.status_update", "interaction_id": "v1_ChdTMjQ0YWJ5TUF1TzcxZThQdjRpcnFRcxIXUzI0NGFieU1BdU83MWU4UHY0aXJxUXM", "status": "in_progress" }
Error Event
{ "event_type": "error", "error": { "message": "Failed to get completed interaction: Result not found.", "code": "not_found" } }
Step Start
{ "event_type": "step.start", "index": 0, "step": { "type": "model_output" } }
Step Delta
{ "event_type": "step.delta", "index": 0, "delta": { "type": "text", "text": "Hello" } }
Step Stop
{ "event_type": "step.stop", "index": 0 }