LlamaIndex

Use Tensoras.ai as the LLM, embeddings, and retrieval backend in your LlamaIndex applications.

Installation

pip install llama-index-tensoras

This installs the Tensoras integration for LlamaIndex. It requires llama-index-core>=0.11.

Authentication

Set your API key as an environment variable:

export TENSORAS_API_KEY="tns_your_key_here"

Or pass it directly to each component.

LLM

Use TensorasLLM as a LlamaIndex LLM:

from llama_index_tensoras import TensorasLLM
 
llm = TensorasLLM(
    model="llama-3.3-70b",
    api_key="tns_your_key_here",  # or set TENSORAS_API_KEY
    temperature=0.7,
    max_tokens=512,
)
 
response = llm.complete("Explain RAG in one sentence.")
print(response.text)

Chat

from llama_index.core.llms import ChatMessage
 
messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is hybrid search?"),
]
 
response = llm.chat(messages)
print(response.message.content)

Streaming

stream = llm.stream_complete("Write a poem about vector databases.")
 
for chunk in stream:
    print(chunk.delta, end="", flush=True)

stream = llm.stream_chat(messages)
 
for chunk in stream:
    print(chunk.delta, end="", flush=True)

Embeddings

Use TensorasEmbedding for document and query embedding:

from llama_index_tensoras import TensorasEmbedding
 
embed_model = TensorasEmbedding(
    model="gte-large-en-v1.5",
    api_key="tns_your_key_here",  # or set TENSORAS_API_KEY
)
 
# Embed a single query
query_vector = embed_model.get_query_embedding("What is deep learning?")
print(f"Dimensions: {len(query_vector)}")
 
# Embed multiple texts
vectors = embed_model.get_text_embedding_batch([
    "Deep learning is a subset of machine learning.",
    "Neural networks have multiple layers.",
])
print(f"Embedded {len(vectors)} texts")

Retriever

Use TensorasRetriever to retrieve from a Tensoras Knowledge Base:

from llama_index_tensoras import TensorasRetriever
 
retriever = TensorasRetriever(
    knowledge_base_id="kb_a1b2c3d4",
    api_key="tns_your_key_here",  # or set TENSORAS_API_KEY
    top_k=5,
)
 
nodes = retriever.retrieve("How do I reset my password?")
 
for node in nodes:
    print(f"Score: {node.score:.3f}")
    print(node.text[:200])
    print()

Query Engine

Build a query engine that retrieves from a Tensoras Knowledge Base and generates answers:

from llama_index_tensoras import TensorasLLM, TensorasRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
 
llm = TensorasLLM(model="llama-3.3-70b")
retriever = TensorasRetriever(knowledge_base_id="kb_a1b2c3d4", top_k=5)
 
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    llm=llm,
)
 
response = query_engine.query("How do I reset my password?")
print(response.response)
 
# View source nodes
for node in response.source_nodes:
    print(f"  Source: {node.metadata.get('filename')}, Score: {node.score:.3f}")

Index with Tensoras Embeddings

Use Tensoras embeddings when building a LlamaIndex VectorStoreIndex:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index_tensoras import TensorasLLM, TensorasEmbedding
 
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
 
# Build index with Tensoras embeddings
embed_model = TensorasEmbedding(model="gte-large-en-v1.5")
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
)
 
# Query with Tensoras LLM
llm = TensorasLLM(model="llama-3.3-70b")
query_engine = index.as_query_engine(llm=llm)
 
response = query_engine.query("Summarize the key points.")
print(response.response)

Settings

You can set Tensoras as the default LLM and embedding model globally:

from llama_index.core import Settings
from llama_index_tensoras import TensorasLLM, TensorasEmbedding
 
Settings.llm = TensorasLLM(model="llama-3.3-70b")
Settings.embed_model = TensorasEmbedding(model="gte-large-en-v1.5")
 
# Now all LlamaIndex components use Tensoras by default

Chat Engine

Build a conversational chat engine:

from llama_index_tensoras import TensorasLLM, TensorasRetriever
from llama_index.core.chat_engine import CondensePlusContextChatEngine
 
llm = TensorasLLM(model="llama-3.3-70b")
retriever = TensorasRetriever(knowledge_base_id="kb_a1b2c3d4", top_k=5)
 
chat_engine = CondensePlusContextChatEngine.from_defaults(
    retriever=retriever,
    llm=llm,
)
 
response = chat_engine.chat("What products do you offer?")
print(response.response)
 
response = chat_engine.chat("Tell me more about the first one.")
print(response.response)

Next Steps

LangChain Integration — use Tensoras with LangChain
RAG Overview — how Tensoras RAG works under the hood
Python SDK — full SDK reference

LangChain LangGraph