API ReferenceCompletions (Legacy)

Completions (Legacy)

Generate a text completion for a given prompt. This is a legacy endpoint maintained for backward compatibility. For new integrations, we recommend using the Chat Completions endpoint instead.

Note: The Chat Completions API (/v1/chat/completions) is the preferred way to interact with Tensoras models. It supports multi-turn conversations, tool use, RAG, and structured output. This legacy endpoint provides basic prompt-in, completion-out functionality only.

Endpoint

POST https://api.tensoras.ai/v1/completions

Authentication

Authorization: Bearer tns_your_key_here

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesThe model to use. One of llama-3.3-70b, llama-3.1-8b, qwen-3-32b, mistral-7b-instruct, deepseek-r1-distill-70b, codestral-latest.
promptstring or arrayYesThe prompt(s) to generate completions for. Can be a string or an array of strings.
max_tokensintegerNo16The maximum number of tokens to generate.
temperaturenumberNo1.0Sampling temperature between 0 and 2.
top_pnumberNo1.0Nucleus sampling parameter.
nintegerNo1Number of completions to generate for each prompt.
streambooleanNofalseWhether to stream partial results as server-sent events.
stopstring or arrayNoUp to 4 sequences where the model will stop generating.
presence_penaltynumberNo0.0Penalizes new tokens based on whether they appear in the text so far. Range: -2.0 to 2.0.
frequency_penaltynumberNo0.0Penalizes new tokens based on their frequency in the text so far. Range: -2.0 to 2.0.
logprobsintegerNoInclude the log probabilities on the most likely tokens. Max value: 5.
suffixstringNoThe suffix that comes after the completion.

Response Body

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1709123456,
  "model": "llama-3.3-70b",
  "choices": [
    {
      "text": " is the capital of France and is known for the Eiffel Tower.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 14,
    "total_tokens": 19
  }
}
FieldTypeDescription
idstringA unique identifier for the completion.
objectstringAlways "text_completion".
createdintegerUnix timestamp of when the completion was created.
modelstringThe model used for the completion.
choicesarrayA list of generated completions.
choices[].textstringThe generated text.
choices[].indexintegerThe index of the choice.
choices[].logprobsobject or nullLog probability information, if requested.
choices[].finish_reasonstringWhy the model stopped. One of "stop", "length".
usageobjectToken usage statistics.

Examples

Basic Completion

curl

curl https://api.tensoras.ai/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "model": "llama-3.3-70b",
    "prompt": "Paris",
    "max_tokens": 64,
    "temperature": 0.7
  }'

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
response = client.completions.create(
    model="llama-3.3-70b",
    prompt="Paris",
    max_tokens=64,
    temperature=0.7,
)
 
print(response.choices[0].text)

Node.js

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.tensoras.ai/v1",
  apiKey: "tns_your_key_here",
});
 
const response = await client.completions.create({
  model: "llama-3.3-70b",
  prompt: "Paris",
  max_tokens: 64,
  temperature: 0.7,
});
 
console.log(response.choices[0].text);

Streaming Completion

Python

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tensoras.ai/v1",
    api_key="tns_your_key_here",
)
 
stream = client.completions.create(
    model="llama-3.3-70b",
    prompt="Once upon a time",
    max_tokens=128,
    stream=True,
)
 
for chunk in stream:
    print(chunk.choices[0].text, end="")

Migrating to Chat Completions

To migrate from the Completions API to Chat Completions, convert your prompt to a messages array:

# Legacy completions
response = client.completions.create(
    model="llama-3.3-70b",
    prompt="Explain quantum computing in simple terms.",
)
 
# Equivalent chat completions (recommended)
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
)

Error Handling

Errors follow a standard format:

{
  "error": {
    "message": "Invalid model: unknown-model",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}