FeaturesStreaming

Streaming

Tensoras supports server-sent event (SSE) streaming for chat completions. Instead of waiting for the entire response to be generated, you receive tokens as they are produced, enabling real-time display in chat interfaces and reducing perceived latency.

How It Works

Set stream=True (Python) or stream: true (Node.js / REST) in your chat completions request. The API returns a stream of server-sent events, each containing a chunk with the same shape as OpenAI’s streaming format — delta objects inside choices.

The final event in every stream is [DONE], signaling that the response is complete.

Python

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about open-source AI."},
    ],
    stream=True,
)
 
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
 
print()  # newline after stream completes

Node.js

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const stream = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a haiku about open-source AI." },
  ],
  stream: true,
});
 
for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta.content) {
    process.stdout.write(delta.content);
  }
}
 
console.log(); // newline after stream completes

REST / cURL

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Authorization: Bearer tns_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a haiku about open-source AI."}
    ],
    "stream": true
  }'

Chunk Format

Each streamed chunk follows the OpenAI format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1700000000,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Open"
      },
      "finish_reason": null
    }
  ]
}
  • The first chunk includes delta.role set to "assistant".
  • Subsequent chunks contain delta.content with the next token(s).
  • The final chunk sets finish_reason to "stop" (or "tool_calls" when streaming tool calls) and delta is empty.

Token Counting During Streaming

Usage information is included in the last chunk of the stream when the response is complete:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{ "index": 0, "delta": {}, "finish_reason": "stop" }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 17,
    "total_tokens": 45
  }
}

You can also pass stream_options: { include_usage: true } to ensure usage is always returned in the final chunk.

Tips

  • All models support streaming. You can use stream=True with any model available on Tensoras, including reasoning models like deepseek-r1-distill-70b.
  • Tool calls stream too. When the model invokes a tool, tool call arguments are streamed incrementally in delta.tool_calls. See Tool Calling for details.
  • Structured output works with streaming. JSON mode and JSON schema mode both work alongside stream=True. See Structured Outputs.