Chat Completions
Create a chat completion by providing a list of messages as input. This is the primary endpoint for generating AI responses and supports streaming, tool use, RAG via knowledge bases, and structured JSON output.
Endpoint
POST https://api.tensoras.ai/v1/chat/completionsAuthentication
Include your API key in the Authorization header:
Authorization: Bearer tns_your_key_hereRequest Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The model to use. One of llama-3.3-70b, llama-3.1-8b, qwen-3-32b, mistral-7b-instruct, deepseek-r1-distill-70b, codestral-latest. |
messages | array | Yes | — | A list of messages comprising the conversation. See message format below. |
stream | boolean | No | false | If true, partial message deltas will be sent as server-sent events (SSE). |
temperature | number | No | 1.0 | Sampling temperature between 0 and 2. Lower values make output more focused and deterministic. |
max_tokens | integer | No | Model default | The maximum number of tokens to generate in the response. |
top_p | number | No | 1.0 | Nucleus sampling parameter. We recommend altering this or temperature but not both. |
tools | array | No | — | A list of tools the model may call. See tool use below. |
tool_choice | string or object | No | "auto" | Controls which tool is called. Options: "auto", "none", "required", or {"type": "function", "function": {"name": "my_function"}}. |
response_format | object | No | — | Controls the output format. Set to {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for schema-constrained decoding. See Structured Outputs. |
knowledge_bases | array | No | — | Tensoras extension for RAG. An array of knowledge base IDs to search for relevant context. See RAG with knowledge bases below. |
Message Format
Each message in the messages array is an object with the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of "system", "user", "assistant", or "tool". |
content | string | Yes | The content of the message. |
tool_calls | array | No | Tool calls generated by the model (for assistant messages). |
tool_call_id | string | No | The ID of the tool call this message is responding to (for tool messages). |
Response Body
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709123456,
"model": "llama-3.3-70b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}| Field | Type | Description |
|---|---|---|
id | string | A unique identifier for the completion. |
object | string | Always "chat.completion". |
created | integer | Unix timestamp of when the completion was created. |
model | string | The model used for the completion. |
choices | array | A list of completion choices. |
choices[].index | integer | The index of the choice in the list. |
choices[].message | object | The generated message. |
choices[].finish_reason | string | The reason the model stopped generating. One of "stop", "length", "tool_calls". |
usage | object | Token usage statistics. |
usage.prompt_tokens | integer | Number of tokens in the prompt. |
usage.completion_tokens | integer | Number of tokens in the generated response. |
usage.total_tokens | integer | Total tokens used (prompt + completion). |
Streaming
When stream is set to true, the response is delivered as server-sent events (SSE). Each event contains a JSON chunk with a delta of the response:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]The stream terminates with a data: [DONE] message.
Examples
Basic Chat Completion
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 256
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(response.choices[0].message.content);Streaming
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "user", "content": "Write a short poem about the ocean."}
],
"stream": true
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Write a short poem about the ocean."},
],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const stream = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "user", content: "Write a short poem about the ocean." },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Tool Use (Function Calling)
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use."
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'Python
import json
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use.",
},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What is the weather in San Francisco?"}]
# First call: model decides to use the tool
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
tools=tools,
tool_choice="auto",
)
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool called: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
# Second call: provide the tool result back to the model
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps({"temperature": 62, "unit": "fahrenheit", "condition": "foggy"}),
})
final_response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
tools=tools,
)
print(final_response.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a given location.",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA",
},
unit: {
type: "string",
enum: ["celsius", "fahrenheit"],
description: "The temperature unit to use.",
},
},
required: ["location"],
},
},
},
];
const messages = [
{ role: "user", content: "What is the weather in San Francisco?" },
];
// First call: model decides to use the tool
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages,
tools,
tool_choice: "auto",
});
const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Tool called: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);
// Second call: provide the tool result back to the model
messages.push(response.choices[0].message);
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify({
temperature: 62,
unit: "fahrenheit",
condition: "foggy",
}),
});
const finalResponse = await client.chat.completions.create({
model: "llama-3.3-70b",
messages,
tools,
});
console.log(finalResponse.choices[0].message.content);RAG with Knowledge Bases
The knowledge_bases parameter is a Tensoras extension that automatically retrieves relevant context from your knowledge bases and injects it into the prompt. This provides built-in RAG without requiring a separate retrieval step.
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "Answer questions using the provided context. If the answer is not in the context, say so."},
{"role": "user", "content": "What is our refund policy?"}
],
"knowledge_bases": ["kb_abc123", "kb_def456"],
"temperature": 0.3
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{
"role": "system",
"content": "Answer questions using the provided context. If the answer is not in the context, say so.",
},
{"role": "user", "content": "What is our refund policy?"},
],
extra_body={
"knowledge_bases": ["kb_abc123", "kb_def456"],
},
temperature=0.3,
)
print(response.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{
role: "system",
content:
"Answer questions using the provided context. If the answer is not in the context, say so.",
},
{ role: "user", content: "What is our refund policy?" },
],
knowledge_bases: ["kb_abc123", "kb_def456"],
temperature: 0.3,
});
console.log(response.choices[0].message.content);JSON Mode
Force the model to produce valid JSON output by setting response_format to {"type": "json_object"}. You must also instruct the model to produce JSON in the system or user message.
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant that outputs JSON."},
{"role": "user", "content": "List the top 3 planets by size with their diameters in km."}
],
"response_format": {"type": "json_object"},
"temperature": 0.5
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant that outputs JSON."},
{"role": "user", "content": "List the top 3 planets by size with their diameters in km."},
],
response_format={"type": "json_object"},
temperature=0.5,
)
print(response.choices[0].message.content)
# {"planets": [{"name": "Jupiter", "diameter_km": 139820}, ...]}Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "system", content: "You are a helpful assistant that outputs JSON." },
{ role: "user", content: "List the top 3 planets by size with their diameters in km." },
],
response_format: { type: "json_object" },
temperature: 0.5,
});
console.log(response.choices[0].message.content);JSON Schema Mode (Structured Outputs)
For guaranteed schema conformance, use response_format with type: "json_schema". The model output is constrained at the token level to match your schema exactly.
curl
curl https://api.tensoras.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"messages": [
{"role": "system", "content": "Extract structured data from the user query."},
{"role": "user", "content": "Tell me about the movie Inception."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "movie",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"year": {"type": "integer"},
"genre": {"type": "string"},
"director": {"type": "string"}
},
"required": ["name", "year", "genre", "director"],
"additionalProperties": false
}
}
}
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "Extract structured data from the user query."},
{"role": "user", "content": "Tell me about the movie Inception."},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "movie",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"year": {"type": "integer"},
"genre": {"type": "string"},
"director": {"type": "string"},
},
"required": ["name", "year", "genre", "director"],
"additionalProperties": False,
},
},
},
)
print(response.choices[0].message.content)
# {"name": "Inception", "year": 2010, "genre": "Science Fiction", "director": "Christopher Nolan"}Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "system", content: "Extract structured data from the user query." },
{ role: "user", content: "Tell me about the movie Inception." },
],
response_format: {
type: "json_schema",
json_schema: {
name: "movie",
strict: true,
schema: {
type: "object",
properties: {
name: { type: "string" },
year: { type: "integer" },
genre: { type: "string" },
director: { type: "string" },
},
required: ["name", "year", "genre", "director"],
additionalProperties: false,
},
},
},
});
console.log(response.choices[0].message.content);For more details on schema support, strict mode, and migration, see Structured Outputs.
Error Handling
Errors follow a standard format:
{
"error": {
"message": "Invalid model: unknown-model",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}Common error codes:
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | The request body is malformed or missing required fields. |
| 401 | authentication_error | Invalid or missing API key. |
| 404 | not_found_error | The specified model was not found. |
| 429 | rate_limit_error | You have exceeded the rate limit. Retry after the time specified in the Retry-After header. |
| 500 | server_error | An internal server error occurred. |