Completions (Legacy)
Generate a text completion for a given prompt. This is a legacy endpoint maintained for backward compatibility. For new integrations, we recommend using the Chat Completions endpoint instead.
Note: The Chat Completions API (
/v1/chat/completions) is the preferred way to interact with Tensoras models. It supports multi-turn conversations, tool use, RAG, and structured output. This legacy endpoint provides basic prompt-in, completion-out functionality only.
Endpoint
POST https://api.tensoras.ai/v1/completionsAuthentication
Authorization: Bearer tns_your_key_hereRequest Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The model to use. One of llama-3.3-70b, llama-3.1-8b, qwen-3-32b, mistral-7b-instruct, deepseek-r1-distill-70b, codestral-latest. |
prompt | string or array | Yes | — | The prompt(s) to generate completions for. Can be a string or an array of strings. |
max_tokens | integer | No | 16 | The maximum number of tokens to generate. |
temperature | number | No | 1.0 | Sampling temperature between 0 and 2. |
top_p | number | No | 1.0 | Nucleus sampling parameter. |
n | integer | No | 1 | Number of completions to generate for each prompt. |
stream | boolean | No | false | Whether to stream partial results as server-sent events. |
stop | string or array | No | — | Up to 4 sequences where the model will stop generating. |
presence_penalty | number | No | 0.0 | Penalizes new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0. |
frequency_penalty | number | No | 0.0 | Penalizes new tokens based on their frequency in the text so far. Range: -2.0 to 2.0. |
logprobs | integer | No | — | Include the log probabilities on the most likely tokens. Max value: 5. |
suffix | string | No | — | The suffix that comes after the completion. |
Response Body
{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1709123456,
"model": "llama-3.3-70b",
"choices": [
{
"text": " is the capital of France and is known for the Eiffel Tower.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 14,
"total_tokens": 19
}
}| Field | Type | Description |
|---|---|---|
id | string | A unique identifier for the completion. |
object | string | Always "text_completion". |
created | integer | Unix timestamp of when the completion was created. |
model | string | The model used for the completion. |
choices | array | A list of generated completions. |
choices[].text | string | The generated text. |
choices[].index | integer | The index of the choice. |
choices[].logprobs | object or null | Log probability information, if requested. |
choices[].finish_reason | string | Why the model stopped. One of "stop", "length". |
usage | object | Token usage statistics. |
Examples
Basic Completion
curl
curl https://api.tensoras.ai/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"model": "llama-3.3-70b",
"prompt": "Paris",
"max_tokens": 64,
"temperature": 0.7
}'Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
response = client.completions.create(
model="llama-3.3-70b",
prompt="Paris",
max_tokens=64,
temperature=0.7,
)
print(response.choices[0].text)Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tensoras.ai/v1",
apiKey: "tns_your_key_here",
});
const response = await client.completions.create({
model: "llama-3.3-70b",
prompt: "Paris",
max_tokens: 64,
temperature: 0.7,
});
console.log(response.choices[0].text);Streaming Completion
Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.tensoras.ai/v1",
api_key="tns_your_key_here",
)
stream = client.completions.create(
model="llama-3.3-70b",
prompt="Once upon a time",
max_tokens=128,
stream=True,
)
for chunk in stream:
print(chunk.choices[0].text, end="")Migrating to Chat Completions
To migrate from the Completions API to Chat Completions, convert your prompt to a messages array:
# Legacy completions
response = client.completions.create(
model="llama-3.3-70b",
prompt="Explain quantum computing in simple terms.",
)
# Equivalent chat completions (recommended)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."},
],
)Error Handling
Errors follow a standard format:
{
"error": {
"message": "Invalid model: unknown-model",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
}
}