Completions (Legacy)

Generate a text completion for a given prompt. This is a legacy endpoint maintained for backward compatibility. For new integrations, we recommend using the Chat Completions endpoint instead.

Note: The Chat Completions API (/v1/chat/completions) is the preferred way to interact with Tensoras models. It supports multi-turn conversations, tool use, RAG, and structured output. This legacy endpoint provides basic prompt-in, completion-out functionality only.

Endpoint

POST https://api.tensoras.ai/v1/completions

Authentication

Authorization: Bearer tns_your_key_here

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	The model to use. One of `llama-3.3-70b`, `llama-3.1-8b`, `qwen-3-32b`, `mistral-7b-instruct`, `deepseek-r1-distill-70b`, `codestral-latest`.
`prompt`	string or array	Yes	—	The prompt(s) to generate completions for. Can be a string or an array of strings.
`max_tokens`	integer	No	`16`	The maximum number of tokens to generate.
`temperature`	number	No	`1.0`	Sampling temperature between 0 and 2.
`top_p`	number	No	`1.0`	Nucleus sampling parameter.
`n`	integer	No	`1`	Number of completions to generate for each prompt.
`stream`	boolean	No	`false`	Whether to stream partial results as server-sent events.
`stop`	string or array	No	—	Up to 4 sequences where the model will stop generating.
`presence_penalty`	number	No	`0.0`	Penalizes new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.
`frequency_penalty`	number	No	`0.0`	Penalizes new tokens based on their frequency in the text so far. Range: -2.0 to 2.0.
`logprobs`	integer	No	—	Include the log probabilities on the most likely tokens. Max value: 5.
`suffix`	string	No	—	The suffix that comes after the completion.

Response Body

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1709123456,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "text": " is the capital of France and is known for the Eiffel Tower.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 14,
    "total_tokens": 19
  }
}

Field	Type	Description
`id`	string	A unique identifier for the completion.
`object`	string	Always `"text_completion"`.
`created`	integer	Unix timestamp of when the completion was created.
`model`	string	The model used for the completion.
`choices`	array	A list of generated completions.
`choices[].text`	string	The generated text.
`choices[].index`	integer	The index of the choice.
`choices[].logprobs`	object or null	Log probability information, if requested.
`choices[].finish_reason`	string	Why the model stopped. One of `"stop"`, `"length"`.
`usage`	object	Token usage statistics.

Examples

Basic Completion

curl

curl https://api.tensoras.ai/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "prompt": "Paris",
    "max_tokens": 64,
    "temperature": 0.7
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.completions.create(
    model="llama-3.3-70b",
    prompt="Paris",
    max_tokens=64,
    temperature=0.7,
)
 
print(response.choices[0].text)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.completions.create({
  model: "llama-3.3-70b",
  prompt: "Paris",
  max_tokens: 64,
  temperature: 0.7,
});
 
console.log(response.choices[0].text);

Streaming Completion

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
stream = client.completions.create(
    model="llama-3.3-70b",
    prompt="Once upon a time",
    max_tokens=128,
    stream=True,
)
 
for chunk in stream:
    print(chunk.choices[0].text, end="")

Migrating to Chat Completions

To migrate from the Completions API to Chat Completions, convert your prompt to a messages array:

# Legacy completions
response = client.completions.create(
    model="llama-3.3-70b",
    prompt="Explain quantum computing in simple terms.",
)
 
# Equivalent chat completions (recommended)
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
)

Error Handling

Errors follow a standard format:

{
  "error": {
    "message": "Invalid model: unknown-model",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

Responses Embeddings