API ReferenceChat Completions

Chat Completions

Create a chat completion by providing a list of messages as input. This is the primary endpoint for generating AI responses and supports streaming, tool use, RAG via knowledge bases, and structured JSON output.

Endpoint

POST https://api.tensoras.ai/v1/chat/completions

Authentication

Include your API key in the Authorization header:

Authorization: Bearer tns_your_key_here

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesThe model to use. One of llama-3.3-70b, llama-3.1-8b, qwen-3-32b, mistral-7b-instruct, deepseek-r1-distill-70b, codestral-latest.
messagesarrayYesA list of messages comprising the conversation. See message format below.
streambooleanNofalseIf true, partial message deltas will be sent as server-sent events (SSE).
temperaturenumberNo1.0Sampling temperature between 0 and 2. Lower values make output more focused and deterministic.
max_tokensintegerNoModel defaultThe maximum number of tokens to generate in the response.
top_pnumberNo1.0Nucleus sampling parameter. We recommend altering this or temperature but not both.
toolsarrayNoA list of tools the model may call. See tool use below.
tool_choicestring or objectNo"auto"Controls which tool is called. Options: "auto", "none", "required", or {"type": "function", "function": {"name": "my_function"}}.
response_formatobjectNoControls the output format. Set to {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for schema-constrained decoding. See Structured Outputs.
knowledge_basesarrayNoTensoras extension for RAG. An array of knowledge base IDs to search for relevant context. See RAG with knowledge bases below.

Message Format

Each message in the messages array is an object with the following fields:

FieldTypeRequiredDescription
rolestringYesOne of "system", "user", "assistant", or "tool".
contentstringYesThe content of the message.
tool_callsarrayNoTool calls generated by the model (for assistant messages).
tool_call_idstringNoThe ID of the tool call this message is responding to (for tool messages).

Response Body

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709123456,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}
FieldTypeDescription
idstringA unique identifier for the completion.
objectstringAlways "chat.completion".
createdintegerUnix timestamp of when the completion was created.
modelstringThe model used for the completion.
choicesarrayA list of completion choices.
choices[].indexintegerThe index of the choice in the list.
choices[].messageobjectThe generated message.
choices[].finish_reasonstringThe reason the model stopped generating. One of "stop", "length", "tool_calls".
usageobjectToken usage statistics.
usage.prompt_tokensintegerNumber of tokens in the prompt.
usage.completion_tokensintegerNumber of tokens in the generated response.
usage.total_tokensintegerTotal tokens used (prompt + completion).

Streaming

When stream is set to true, the response is delivered as server-sent events (SSE). Each event contains a JSON chunk with a delta of the response:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709123456,"model":"llama-3.3-70b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

The stream terminates with a data: [DONE] message.

Examples

Basic Chat Completion

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=256,
)
 
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
  max_tokens: 256,
});
 
console.log(response.choices[0].message.content);

Streaming

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "Write a short poem about the ocean."}
    ],
    "stream": true
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Write a short poem about the ocean."},
    ],
    stream=True,
)
 
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const stream = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "user", content: "Write a short poem about the ocean." },
  ],
  stream: true,
});
 
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Tool Use (Function Calling)

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a given location.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "The temperature unit to use."
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Python

import json
from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use.",
                    },
                },
                "required": ["location"],
            },
        },
    }
]
 
messages = [{"role": "user", "content": "What is the weather in San Francisco?"}]
 
# First call: model decides to use the tool
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
 
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool called: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
 
# Second call: provide the tool result back to the model
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps({"temperature": 62, "unit": "fahrenheit", "condition": "foggy"}),
})
 
final_response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    tools=tools,
)
 
print(final_response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a given location.",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "The temperature unit to use.",
          },
        },
        required: ["location"],
      },
    },
  },
];
 
const messages = [
  { role: "user", content: "What is the weather in San Francisco?" },
];
 
// First call: model decides to use the tool
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages,
  tools,
  tool_choice: "auto",
});
 
const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Tool called: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);
 
// Second call: provide the tool result back to the model
messages.push(response.choices[0].message);
messages.push({
  role: "tool",
  tool_call_id: toolCall.id,
  content: JSON.stringify({
    temperature: 62,
    unit: "fahrenheit",
    condition: "foggy",
  }),
});
 
const finalResponse = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages,
  tools,
});
 
console.log(finalResponse.choices[0].message.content);

RAG with Knowledge Bases

The knowledge_bases parameter is a Tensoras extension that automatically retrieves relevant context from your knowledge bases and injects it into the prompt. This provides built-in RAG without requiring a separate retrieval step.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "Answer questions using the provided context. If the answer is not in the context, say so."},
      {"role": "user", "content": "What is our refund policy?"}
    ],
    "knowledge_bases": ["kb_abc123", "kb_def456"],
    "temperature": 0.3
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {
            "role": "system",
            "content": "Answer questions using the provided context. If the answer is not in the context, say so.",
        },
        {"role": "user", "content": "What is our refund policy?"},
    ],
    extra_body={
        "knowledge_bases": ["kb_abc123", "kb_def456"],
    },
    temperature=0.3,
)
 
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    {
      role: "system",
      content:
        "Answer questions using the provided context. If the answer is not in the context, say so.",
    },
    { role: "user", content: "What is our refund policy?" },
  ],
  knowledge_bases: ["kb_abc123", "kb_def456"],
  temperature: 0.3,
});
 
console.log(response.choices[0].message.content);

JSON Mode

Force the model to produce valid JSON output by setting response_format to {"type": "json_object"}. You must also instruct the model to produce JSON in the system or user message.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
      {"role": "user", "content": "List the top 3 planets by size with their diameters in km."}
    ],
    "response_format": {"type": "json_object"},
    "temperature": 0.5
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
        {"role": "user", "content": "List the top 3 planets by size with their diameters in km."},
    ],
    response_format={"type": "json_object"},
    temperature=0.5,
)
 
print(response.choices[0].message.content)
# {"planets": [{"name": "Jupiter", "diameter_km": 139820}, ...]}

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant that outputs JSON." },
    { role: "user", content: "List the top 3 planets by size with their diameters in km." },
  ],
  response_format: { type: "json_object" },
  temperature: 0.5,
});
 
console.log(response.choices[0].message.content);

JSON Schema Mode (Structured Outputs)

For guaranteed schema conformance, use response_format with type: "json_schema". The model output is constrained at the token level to match your schema exactly.

curl

curl https://api.tensoras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "Extract structured data from the user query."},
      {"role": "user", "content": "Tell me about the movie Inception."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "movie",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "year": {"type": "integer"},
            "genre": {"type": "string"},
            "director": {"type": "string"}
          },
          "required": ["name", "year", "genre", "director"],
          "additionalProperties": false
        }
      }
    }
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Extract structured data from the user query."},
        {"role": "user", "content": "Tell me about the movie Inception."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "year": {"type": "integer"},
                    "genre": {"type": "string"},
                    "director": {"type": "string"},
                },
                "required": ["name", "year", "genre", "director"],
                "additionalProperties": False,
            },
        },
    },
)
 
print(response.choices[0].message.content)
# {"name": "Inception", "year": 2010, "genre": "Science Fiction", "director": "Christopher Nolan"}

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "Extract structured data from the user query." },
    { role: "user", content: "Tell me about the movie Inception." },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "movie",
      strict: true,
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          year: { type: "integer" },
          genre: { type: "string" },
          director: { type: "string" },
        },
        required: ["name", "year", "genre", "director"],
        additionalProperties: false,
      },
    },
  },
});
 
console.log(response.choices[0].message.content);

For more details on schema support, strict mode, and migration, see Structured Outputs.

Error Handling

Errors follow a standard format:

{
  "error": {
    "message": "Invalid model: unknown-model",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Common error codes:

HTTP StatusError TypeDescription
400invalid_request_errorThe request body is malformed or missing required fields.
401authentication_errorInvalid or missing API key.
404not_found_errorThe specified model was not found.
429rate_limit_errorYou have exceeded the rate limit. Retry after the time specified in the Retry-After header.
500server_errorAn internal server error occurred.