Python SDK

The official Tensoras Python SDK provides a typed, ergonomic client for the Tensoras.ai API with both synchronous and asynchronous interfaces.

Installation

pip install tensoras

Requires Python 3.8+.

Quick Start

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Explain RAG in one sentence."},
    ],
)
 
print(response.choices[0].message.content)

Authentication

The client looks for an API key in this order:

The api_key parameter passed to the constructor.
The TENSORAS_API_KEY environment variable.

export TENSORAS_API_KEY="tns_your_key_here"

from tensoras import Tensoras
 
client = Tensoras()  # reads TENSORAS_API_KEY from env

Custom Base URL

Point the client at a different endpoint for local development or self-hosted deployments:

client = Tensoras(
    api_key="tns_...",
    base_url="http://localhost:8000/v1",
)

The default base URL is https://api.tensoras.ai/v1.

Async Client

Use AsyncTensoras for async/await workflows (FastAPI, async scripts, etc.):

from tensoras import AsyncTensoras
 
client = AsyncTensoras()  # reads TENSORAS_API_KEY from env
 
async def main():
    response = await client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "user", "content": "What is hybrid search?"},
        ],
    )
    print(response.choices[0].message.content)

Available Resources

The client exposes the full Tensoras API through typed resource objects:

Resource	Description
`client.chat.completions`	Chat completions (streaming and non-streaming)
`client.embeddings`	Text embeddings
`client.rerank`	Reranking
`client.models`	List and retrieve models
`client.files`	Upload and manage files
`client.batches`	Batch processing
`client.knowledge_bases`	Create and manage Knowledge Bases

Chat Completions

Basic Request

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=256,
)
 
print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Write a short poem about APIs."},
    ],
    stream=True,
)
 
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
 
print()

Async Streaming

from tensoras import AsyncTensoras
 
client = AsyncTensoras()
 
stream = await client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Write a short poem about APIs."},
    ],
    stream=True,
)
 
async for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
 
print()

Structured Outputs

Force the model to return JSON conforming to a specific schema using response_format:

JSON Object Mode

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Return JSON with keys: name, genre, year."},
        {"role": "user", "content": "Tell me about Inception."},
    ],
    response_format={"type": "json_object"},
)
 
import json
data = json.loads(response.choices[0].message.content)

JSON Schema Mode

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Extract movie data."},
        {"role": "user", "content": "Tell me about Inception."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "year": {"type": "integer"},
                    "genre": {"type": "string"},
                },
                "required": ["name", "year", "genre"],
                "additionalProperties": False,
            },
        },
    },
)

You can also use the typed models:

from tensoras.types import ResponseFormatJsonSchema, JsonSchemaConfig
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Tell me about Inception."},
    ],
    response_format=ResponseFormatJsonSchema(
        json_schema=JsonSchemaConfig(
            name="movie",
            strict=True,
            schema={
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "year": {"type": "integer"},
                },
                "required": ["name", "year"],
                "additionalProperties": False,
            },
        ),
    ),
)

See Structured Outputs for full details on schema support and best practices.

Embeddings

response = client.embeddings.create(
    model="gte-large-en-v1.5",
    input="The quick brown fox jumps over the lazy dog.",
)
 
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Reranking

response = client.rerank.create(
    model="bge-reranker-v2-m3",
    query="What is deep learning?",
    documents=[
        "Deep learning is a subset of machine learning.",
        "The weather today is sunny.",
        "Neural networks are the foundation of deep learning.",
    ],
)
 
for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")

Knowledge Bases

Create a Knowledge Base

kb = client.knowledge_bases.create(
    name="support-docs",
    description="Customer support documentation",
)
 
print(kb.id)  # e.g. "kb_a1b2c3d4"

List Knowledge Bases

knowledge_bases = client.knowledge_bases.list()
 
for kb in knowledge_bases.data:
    print(f"{kb.id}: {kb.name}")

Retrieve a Knowledge Base

kb = client.knowledge_bases.retrieve("kb_a1b2c3d4")
print(kb.name, kb.status)

Add a Data Source

data_source = client.knowledge_bases.data_sources.create(
    knowledge_base_id="kb_a1b2c3d4",
    type="file_upload",
    file=open("handbook.pdf", "rb"),
)
 
print(data_source.status)  # "processing" -> "completed"

List Documents

documents = client.knowledge_bases.documents.list(
    knowledge_base_id="kb_a1b2c3d4",
)
 
for doc in documents.data:
    print(f"{doc.id}: {doc.filename} ({doc.status})")

Query with RAG

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "How do I reset my password?"},
    ],
    knowledge_bases=["kb_a1b2c3d4"],
)
 
print(response.choices[0].message.content)
 
for citation in response.citations:
    print(f"  Source: {citation.source}, Score: {citation.score:.3f}")

Error Handling

The SDK raises typed exceptions that you can catch individually:

from tensoras import Tensoras, TensorasAPIError, AuthenticationError, RateLimitError
 
client = Tensoras()
 
try:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[{"role": "user", "content": "Hello"}],
    )
except AuthenticationError:
    print("Invalid or missing API key.")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s.")
except TensorasAPIError as e:
    print(f"API error {e.status_code}: {e.message}")

Exception Hierarchy

Exception	Status Code	Description
`TensorasAPIError`	—	Base class for all API errors
`AuthenticationError`	401	Invalid or missing API key
`PermissionDeniedError`	403	Key lacks required permissions
`NotFoundError`	404	Resource not found
`RateLimitError`	429	Too many requests
`InternalServerError`	500+	Server-side error

Automatic Retries

The SDK automatically retries failed requests up to 3 times with exponential backoff for transient errors (429, 500, 502, 503, 504). You can customize this:

client = Tensoras(
    max_retries=5,        # default: 3
    timeout=60.0,         # request timeout in seconds, default: 120
)

To disable retries:

client = Tensoras(max_retries=0)

Models

models = client.models.list()
 
for model in models.data:
    print(f"{model.id}: {model.owned_by}")

Next Steps

Node.js SDK — JavaScript/TypeScript client
OpenAI-Compatible Usage — use the OpenAI SDK with Tensoras
Streaming — SSE details and cancellation
Tool Calling — let the model invoke functions
RAG Overview — end-to-end retrieval-augmented generation

Connectors Node.js