Migrate from OpenAI

Tensoras.ai implements the OpenAI API specification, so migrating an existing OpenAI integration takes just a few minutes. This guide walks through each step and highlights the differences you should know about.

Step 1: Get a Tensoras API Key

Sign up or log in at cloud.tensoras.ai.
Navigate to Console > API Keys.
Click Create Key, give it a name, and copy the key. It starts with tns_.

export TENSORAS_API_KEY="tns_your_key_here"

Step 2: Update Base URL and API Key

The only code change required is pointing your client at the Tensoras endpoint and swapping the key.

Python (OpenAI SDK)

# Before -- OpenAI
from openai import OpenAI
 
client = OpenAI(
    api_key="sk-...",
)
 
# After -- Tensoras
from openai import OpenAI
 
client = OpenAI(
    api_key="tns_your_key_here",
    base_url="https://api.tensoras.ai/v1",
)

Or use the native Tensoras SDK, which wraps the same API with additional helpers:

from tensoras import Tensoras
 
client = Tensoras()  # reads TENSORAS_API_KEY from env

Node.js (OpenAI SDK)

// Before -- OpenAI
import OpenAI from "openai";
 
const client = new OpenAI({
  apiKey: "sk-...",
});
 
// After -- Tensoras
import OpenAI from "openai";
 
const client = new OpenAI({
  apiKey: "tns_your_key_here",
  baseURL: "https://api.tensoras.ai/v1",
});

Or use the native Tensoras SDK:

import Tensoras from "tensoras";
 
const client = new Tensoras(); // reads TENSORAS_API_KEY from env

curl

# Before -- OpenAI
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{ ... }'
 
# After -- Tensoras
curl https://api.tensoras.ai/v1/chat/completions \
  -H "Authorization: Bearer $TENSORAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ ... }'

Step 3: Map Your Models

Replace OpenAI model names with the corresponding Tensoras models:

OpenAI Model	Tensoras Model	Notes
`gpt-4o`	`llama-3.3-70b`	Best overall quality, $0.20/$0.60 per M tokens
`gpt-4o-mini`	`llama-3.1-8b`	Fast and cheap, $0.05/$0.10 per M tokens
`o1` / `o1-mini`	`deepseek-r1-distill-70b`	Chain-of-thought reasoning, $0.15/$0.45 per M tokens
`gpt-3.5-turbo`	`mistral-7b-instruct`	Budget inference, $0.04/$0.08 per M tokens
`text-embedding-3-small`	`bge-large-en-v1.5`	Embeddings
`text-embedding-3-large`	`bge-large-en-v1.5`	Embeddings

Before and After — Python

# Before -- OpenAI
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."},
    ],
)
 
# After -- Tensoras
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."},
    ],
)

Before and After — Node.js

// Before -- OpenAI
const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing." },
  ],
});
 
// After -- Tensoras
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing." },
  ],
});

What’s Compatible

Tensoras supports the same request and response formats as OpenAI for the following features:

Chat Completions — POST /v1/chat/completions with messages, temperature, top_p, max_tokens, stop sequences, and more
Streaming — Server-Sent Events with stream: true, same delta format
Tool Calling — tools and tool_choice parameters work identically
JSON Mode — response_format: { type: "json_object" } is supported
Structured Outputs — response_format: { type: "json_schema", json_schema: {...} } is supported
Embeddings — POST /v1/embeddings with the same request/response shape
Files — POST /v1/files for uploading documents

Any code that uses these features through the OpenAI SDK will work with Tensoras after updating the base URL, API key, and model name.

What’s Different

While the core API is compatible, there are a few differences to be aware of:

Model Names

Tensoras hosts open-source models. You must use Tensoras model identifiers (e.g., llama-3.3-70b) instead of OpenAI model names (e.g., gpt-4o). See the mapping table above.

Knowledge Bases (Tensoras Extension)

Tensoras adds a knowledge_bases parameter to the chat completions endpoint. This lets you attach one or more Knowledge Bases for retrieval-augmented generation directly in the API call — no separate retrieval step required:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "What is our refund policy?"},
    ],
    knowledge_bases=["kb_a1b2c3d4"],
)

This parameter is not part of the OpenAI spec and will be ignored if you send it to OpenAI. See the RAG Overview for details.

Pricing Model

OpenAI and Tensoras both charge per token, but Tensoras pricing is significantly lower because it serves open-source models on optimized infrastructure. See the Billing guide for the full pricing table.

No Assistants API

Tensoras does not implement the OpenAI Assistants API. If you use Assistants, Threads, or Runs, you will need to migrate that logic to direct chat completions with tool calling.

Full Migration Checklist

Create a Tensoras API key at cloud.tensoras.ai
Update base_url / baseURL to https://api.tensoras.ai/v1
Replace api_key / apiKey with your tns_ key
Replace OpenAI model names with Tensoras equivalents
Test chat completions, streaming, and any tool calling flows
If using embeddings, switch to bge-large-en-v1.5 and re-embed your data
Update any hardcoded cost calculations to use Tensoras pricing
(Optional) Switch from the OpenAI SDK to the native Tensoras Python SDK or Node.js SDK for Knowledge Base support

Next Steps

Quickstart — make your first Tensoras API call
OpenAI-Compatible Usage — detailed OpenAI SDK configuration
Billing — pricing details for all models
RAG Overview — leverage Knowledge Bases for grounded responses

MCP Rate Limits