AI Gateway correctly tracks costs and token usage for non-streaming Azure OpenAI requests, but fails to do so for streaming (SSE) responses.
Observed behavior:
- Non-streaming (stream: false): Model shows as gpt-5.3-chat with correct token counts and costs
- Streaming (stream: true): Model shows as gpt-5.3 with no token counts or costs
The streaming response DOES include usage data in the final SSE chunk (verified by logging the raw response):
data: {"choices":[],"model":"gpt-5.3-chat-2026-03-03","usage":{"completion_tokens":137,"prompt_tokens":3886,"total_tokens":4023}}
data: [DONE]
The stream_options: { include_usage: true } flag is set in the request body, and Azure returns the usage correctly — Gateway just doesn't read it.
For Cloudflare employees, an example request ID is 01KMY0CE9AKNBRFBHB7J9W4X2G
AI Gateway correctly tracks costs and token usage for non-streaming Azure OpenAI requests, but fails to do so for streaming (SSE) responses.
Observed behavior:
The streaming response DOES include usage data in the final SSE chunk (verified by logging the raw response):
data: {"choices":[],"model":"gpt-5.3-chat-2026-03-03","usage":{"completion_tokens":137,"prompt_tokens":3886,"total_tokens":4023}}
data: [DONE]
The stream_options: { include_usage: true } flag is set in the request body, and Azure returns the usage correctly — Gateway just doesn't read it.
For Cloudflare employees, an example request ID is 01KMY0CE9AKNBRFBHB7J9W4X2G