Python SDK
The official Tensoras Python SDK provides a typed, ergonomic client for the Tensoras.ai API with both synchronous and asynchronous interfaces.
Installation
pip install tensorasRequires Python 3.8+.
Quick Start
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Explain RAG in one sentence."},
],
)
print(response.choices[0].message.content)Authentication
The client looks for an API key in this order:
- The
api_keyparameter passed to the constructor. - The
TENSORAS_API_KEYenvironment variable.
export TENSORAS_API_KEY="tns_your_key_here"from tensoras import Tensoras
client = Tensoras() # reads TENSORAS_API_KEY from envCustom Base URL
Point the client at a different endpoint for local development or self-hosted deployments:
client = Tensoras(
api_key="tns_...",
base_url="http://localhost:8000/v1",
)The default base URL is https://api.tensoras.ai/v1.
Async Client
Use AsyncTensoras for async/await workflows (FastAPI, async scripts, etc.):
from tensoras import AsyncTensoras
client = AsyncTensoras() # reads TENSORAS_API_KEY from env
async def main():
response = await client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "What is hybrid search?"},
],
)
print(response.choices[0].message.content)Available Resources
The client exposes the full Tensoras API through typed resource objects:
| Resource | Description |
|---|---|
client.chat.completions | Chat completions (streaming and non-streaming) |
client.embeddings | Text embeddings |
client.rerank | Reranking |
client.models | List and retrieve models |
client.files | Upload and manage files |
client.batches | Batch processing |
client.knowledge_bases | Create and manage Knowledge Bases |
Chat Completions
Basic Request
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Streaming
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Write a short poem about APIs."},
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()Async Streaming
from tensoras import AsyncTensoras
client = AsyncTensoras()
stream = await client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Write a short poem about APIs."},
],
stream=True,
)
async for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()Structured Outputs
Force the model to return JSON conforming to a specific schema using response_format:
JSON Object Mode
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "Return JSON with keys: name, genre, year."},
{"role": "user", "content": "Tell me about Inception."},
],
response_format={"type": "json_object"},
)
import json
data = json.loads(response.choices[0].message.content)JSON Schema Mode
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "Extract movie data."},
{"role": "user", "content": "Tell me about Inception."},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "movie",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"year": {"type": "integer"},
"genre": {"type": "string"},
},
"required": ["name", "year", "genre"],
"additionalProperties": False,
},
},
},
)You can also use the typed models:
from tensoras.types import ResponseFormatJsonSchema, JsonSchemaConfig
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Tell me about Inception."},
],
response_format=ResponseFormatJsonSchema(
json_schema=JsonSchemaConfig(
name="movie",
strict=True,
schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"year": {"type": "integer"},
},
"required": ["name", "year"],
"additionalProperties": False,
},
),
),
)See Structured Outputs for full details on schema support and best practices.
Embeddings
response = client.embeddings.create(
model="gte-large-en-v1.5",
input="The quick brown fox jumps over the lazy dog.",
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")Reranking
response = client.rerank.create(
model="bge-reranker-v2-m3",
query="What is deep learning?",
documents=[
"Deep learning is a subset of machine learning.",
"The weather today is sunny.",
"Neural networks are the foundation of deep learning.",
],
)
for result in response.results:
print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")Knowledge Bases
Create a Knowledge Base
kb = client.knowledge_bases.create(
name="support-docs",
description="Customer support documentation",
)
print(kb.id) # e.g. "kb_a1b2c3d4"List Knowledge Bases
knowledge_bases = client.knowledge_bases.list()
for kb in knowledge_bases.data:
print(f"{kb.id}: {kb.name}")Retrieve a Knowledge Base
kb = client.knowledge_bases.retrieve("kb_a1b2c3d4")
print(kb.name, kb.status)Add a Data Source
data_source = client.knowledge_bases.data_sources.create(
knowledge_base_id="kb_a1b2c3d4",
type="file_upload",
file=open("handbook.pdf", "rb"),
)
print(data_source.status) # "processing" -> "completed"List Documents
documents = client.knowledge_bases.documents.list(
knowledge_base_id="kb_a1b2c3d4",
)
for doc in documents.data:
print(f"{doc.id}: {doc.filename} ({doc.status})")Query with RAG
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "How do I reset my password?"},
],
knowledge_bases=["kb_a1b2c3d4"],
)
print(response.choices[0].message.content)
for citation in response.citations:
print(f" Source: {citation.source}, Score: {citation.score:.3f}")Error Handling
The SDK raises typed exceptions that you can catch individually:
from tensoras import Tensoras, TensorasAPIError, AuthenticationError, RateLimitError
client = Tensoras()
try:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello"}],
)
except AuthenticationError:
print("Invalid or missing API key.")
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after}s.")
except TensorasAPIError as e:
print(f"API error {e.status_code}: {e.message}")Exception Hierarchy
| Exception | Status Code | Description |
|---|---|---|
TensorasAPIError | — | Base class for all API errors |
AuthenticationError | 401 | Invalid or missing API key |
PermissionDeniedError | 403 | Key lacks required permissions |
NotFoundError | 404 | Resource not found |
RateLimitError | 429 | Too many requests |
InternalServerError | 500+ | Server-side error |
Automatic Retries
The SDK automatically retries failed requests up to 3 times with exponential backoff for transient errors (429, 500, 502, 503, 504). You can customize this:
client = Tensoras(
max_retries=5, # default: 3
timeout=60.0, # request timeout in seconds, default: 120
)To disable retries:
client = Tensoras(max_retries=0)Models
models = client.models.list()
for model in models.data:
print(f"{model.id}: {model.owned_by}")Next Steps
- Node.js SDK — JavaScript/TypeScript client
- OpenAI-Compatible Usage — use the OpenAI SDK with Tensoras
- Streaming — SSE details and cancellation
- Tool Calling — let the model invoke functions
- RAG Overview — end-to-end retrieval-augmented generation