Chat Completions

Create a chat completion by providing a list of messages as input. This is the primary endpoint for generating AI responses and supports streaming, tool use, RAG via knowledge bases, and structured JSON output.

Endpoint

POST https://api.tensoras.ai/v1/chat/completions

Authentication

Include your API key in the Authorization header:

Authorization: Bearer tns_your_key_here

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	The model to use. One of `llama-3.3-70b`, `llama-3.1-8b`, `qwen-3-32b`, `mistral-7b-instruct`, `deepseek-r1-distill-70b`, `codestral-latest`.
`messages`	array	Yes	—	A list of messages comprising the conversation. See message format below.
`stream`	boolean	No	`false`	If `true`, partial message deltas will be sent as server-sent events (SSE).
`temperature`	number	No	`1.0`	Sampling temperature between 0 and 2. Lower values make output more focused and deterministic.
`max_tokens`	integer	No	Model default	The maximum number of tokens to generate in the response.
`top_p`	number	No	`1.0`	Nucleus sampling parameter. We recommend altering this or `temperature` but not both.
`tools`	array	No	—	A list of tools the model may call. See tool use below.
`tool_choice`	string or object	No	`"auto"`	Controls which tool is called. Options: `"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "my_function"}}`.
`response_format`	object	No	—	Controls the output format. Set to `{"type": "json_object"}` for JSON mode, or `{"type": "json_schema", "json_schema": {...}}` for schema-constrained decoding. See Structured Outputs.
`knowledge_bases`	array	No	—	Tensoras extension for RAG. An array of knowledge base IDs to search for relevant context. See RAG with knowledge bases below.

Message Format

Each message in the messages array is an object with the following fields:

Field	Type	Required	Description
`role`	string	Yes	One of `"system"`, `"user"`, `"assistant"`, or `"tool"`.
`content`	string	Yes	The content of the message.
`tool_calls`	array	No	Tool calls generated by the model (for `assistant` messages).
`tool_call_id`	string	No	The ID of the tool call this message is responding to (for `tool` messages).

Response Body

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709123456,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Field	Type	Description
`id`	string	A unique identifier for the completion.
`object`	string	Always `"chat.completion"`.
`created`	integer	Unix timestamp of when the completion was created.
`model`	string	The model used for the completion.
`choices`	array	A list of completion choices.
`choices[].index`	integer	The index of the choice in the list.
`choices[].message`	object	The generated message.
`choices[].finish_reason`	string	The reason the model stopped generating. One of `"stop"`, `"length"`, `"tool_calls"`.
`usage`	object	Token usage statistics.
`usage.prompt_tokens`	integer	Number of tokens in the prompt.
`usage.completion_tokens`	integer	Number of tokens in the generated response.
`usage.total_tokens`	integer	Total tokens used (prompt + completion).

Streaming

When stream is set to true, the response is delivered as server-sent events (SSE). Each event contains a JSON chunk with a delta of the response:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

The stream terminates with a data: [DONE] message.

Examples

Basic Chat Completion

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=256,
)
 
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 256,
});
 
console.log(response.choices[0].message.content);

Streaming

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Write a short poem about the ocean."}
    ],
    "stream": true
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Write a short poem about the ocean."},
    ],
    stream=True,
)
 
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const stream = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "user", content: "Write a short poem about the ocean." },
  ],
  stream: true,
});
 
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Tool Use (Function Calling)

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a given location.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "The temperature unit to use."
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Python

import json
from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use.",
                    },
                },
                "required": ["location"],
            },
        },
    }
]
 
messages = [{"role": "user", "content": "What is the weather in San Francisco?"}]
 
# First call: model decides to use the tool
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
 
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool called: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
 
# Second call: provide the tool result back to the model
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps({"temperature": 62, "unit": "fahrenheit", "condition": "foggy"}),
})
 
final_response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    tools=tools,
)
 
print(final_response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a given location.",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "The temperature unit to use.",
          },
        },
        required: ["location"],
      },
    },
  },
];
 
const messages = [
  { role: "user", content: "What is the weather in San Francisco?" },
];
 
// First call: model decides to use the tool
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages,
  tools,
  tool_choice: "auto",
});
 
const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Tool called: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);
 
// Second call: provide the tool result back to the model
messages.push(response.choices[0].message);
messages.push({
  role: "tool",
  tool_call_id: toolCall.id,
  content: JSON.stringify({
    temperature: 62,
    unit: "fahrenheit",
    condition: "foggy",
  }),
});
 
const finalResponse = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages,
  tools,
});
 
console.log(finalResponse.choices[0].message.content);

RAG with Knowledge Bases

The knowledge_bases parameter is a Tensoras extension that automatically retrieves relevant context from your knowledge bases and injects it into the prompt. This provides built-in RAG without requiring a separate retrieval step.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "Answer questions using the provided context. If the answer is not in the context, say so."},
      {"role": "user", "content": "What is our refund policy?"}
    ],
    "knowledge_bases": ["kb_abc123", "kb_def456"],
    "temperature": 0.3
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {
            "role": "system",
            "content": "Answer questions using the provided context. If the answer is not in the context, say so.",
        },
        {"role": "user", "content": "What is our refund policy?"},
    ],
    extra_body={
        "knowledge_bases": ["kb_abc123", "kb_def456"],
    },
    temperature=0.3,
)
 
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    {
      role: "system",
      content:
        "Answer questions using the provided context. If the answer is not in the context, say so.",
    },
    { role: "user", content: "What is our refund policy?" },
  ],
  knowledge_bases: ["kb_abc123", "kb_def456"],
  temperature: 0.3,
});
 
console.log(response.choices[0].message.content);

JSON Mode

Force the model to produce valid JSON output by setting response_format to {"type": "json_object"}. You must also instruct the model to produce JSON in the system or user message.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
      {"role": "user", "content": "List the top 3 planets by size with their diameters in km."}
    ],
    "response_format": {"type": "json_object"},
    "temperature": 0.5
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
        {"role": "user", "content": "List the top 3 planets by size with their diameters in km."},
    ],
    response_format={"type": "json_object"},
    temperature=0.5,
)
 
print(response.choices[0].message.content)
# {"planets": [{"name": "Jupiter", "diameter_km": 139820}, ...]}

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant that outputs JSON." },
    { role: "user", content: "List the top 3 planets by size with their diameters in km." },
  ],
  response_format: { type: "json_object" },
  temperature: 0.5,
});
 
console.log(response.choices[0].message.content);

JSON Schema Mode (Structured Outputs)

For guaranteed schema conformance, use response_format with type: "json_schema". The model output is constrained at the token level to match your schema exactly.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "Extract structured data from the user query."},
      {"role": "user", "content": "Tell me about the movie Inception."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "movie",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "year": {"type": "integer"},
            "genre": {"type": "string"},
            "director": {"type": "string"}
          },
          "required": ["name", "year", "genre", "director"],
          "additionalProperties": false
        }
      }
    }
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Extract structured data from the user query."},
        {"role": "user", "content": "Tell me about the movie Inception."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "year": {"type": "integer"},
                    "genre": {"type": "string"},
                    "director": {"type": "string"},
                },
                "required": ["name", "year", "genre", "director"],
                "additionalProperties": False,
            },
        },
    },
)
 
print(response.choices[0].message.content)
# {"name": "Inception", "year": 2010, "genre": "Science Fiction", "director": "Christopher Nolan"}

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "Extract structured data from the user query." },
    { role: "user", content: "Tell me about the movie Inception." },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "movie",
      strict: true,
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          year: { type: "integer" },
          genre: { type: "string" },
          director: { type: "string" },
        },
        required: ["name", "year", "genre", "director"],
        additionalProperties: false,
      },
    },
  },
});
 
console.log(response.choices[0].message.content);

For more details on schema support, strict mode, and migration, see Structured Outputs.

Error Handling

Errors follow a standard format:

{
  "error": {
    "message": "Invalid model: unknown-model",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Common error codes:

HTTP Status	Error Type	Description
400	`invalid_request_error`	The request body is malformed or missing required fields.
401	`authentication_error`	Invalid or missing API key.
404	`not_found_error`	The specified model was not found.
429	`rate_limit_error`	You have exceeded the rate limit. Retry after the time specified in the `Retry-After` header.
500	`server_error`	An internal server error occurred.

Authentication Responses