Streaming
Tensoras supports server-sent event (SSE) streaming for chat completions. Instead of waiting for the entire response to be generated, you receive tokens as they are produced, enabling real-time display in chat interfaces and reducing perceived latency.
How It Works
Set stream=True (Python) or stream: true (Node.js / REST) in your chat completions request. The API returns a stream of server-sent events, each containing a chunk with the same shape as OpenAI’s streaming format — delta objects inside choices.
The final event in every stream is [DONE], signaling that the response is complete.
Python
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about open-source AI."},
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # newline after stream completesNode.js
import Tensoras from "tensoras";
const client = new Tensoras({ apiKey: "tns_your_key_here" });
const stream = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a haiku about open-source AI." },
],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0].delta;
if (delta.content) {
process.stdout.write(delta.content);
}
}
console.log(); // newline after stream completesREST / cURL
curl https://api.tensoras.ai/v1/chat/completions \
-H "Authorization: Bearer tns_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about open-source AI."}
],
"stream": true
}'Chunk Format
Each streamed chunk follows the OpenAI format:
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1700000000,
"model": "llama-3.3-70b",
"choices": [
{
"index": 0,
"delta": {
"content": "Open"
},
"finish_reason": null
}
]
}- The first chunk includes
delta.roleset to"assistant". - Subsequent chunks contain
delta.contentwith the next token(s). - The final chunk sets
finish_reasonto"stop"(or"tool_calls"when streaming tool calls) anddeltais empty.
Token Counting During Streaming
Usage information is included in the last chunk of the stream when the response is complete:
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"choices": [{ "index": 0, "delta": {}, "finish_reason": "stop" }],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 17,
"total_tokens": 45
}
}You can also pass stream_options: { include_usage: true } to ensure usage is always returned in the final chunk.
Tips
- All models support streaming. You can use
stream=Truewith any model available on Tensoras, including reasoning models likedeepseek-r1-distill-70b. - Tool calls stream too. When the model invokes a tool, tool call arguments are streamed incrementally in
delta.tool_calls. See Tool Calling for details. - Structured output works with streaming. JSON mode and JSON schema mode both work alongside
stream=True. See Structured Outputs.
Related
- Tool Calling — streaming tool call arguments
- Structured Outputs — JSON mode with streaming
- Chat Completions API — full endpoint reference
- Python SDK — SDK installation and setup
- Node.js SDK — SDK installation and setup