LlamaIndex
Use Tensoras.ai as the LLM, embeddings, and retrieval backend in your LlamaIndex applications.
Installation
pip install llama-index-tensorasThis installs the Tensoras integration for LlamaIndex. It requires llama-index-core>=0.11.
Authentication
Set your API key as an environment variable:
export TENSORAS_API_KEY="tns_your_key_here"Or pass it directly to each component.
LLM
Use TensorasLLM as a LlamaIndex LLM:
from llama_index_tensoras import TensorasLLM
llm = TensorasLLM(
model="llama-3.3-70b",
api_key="tns_your_key_here", # or set TENSORAS_API_KEY
temperature=0.7,
max_tokens=512,
)
response = llm.complete("Explain RAG in one sentence.")
print(response.text)Chat
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="system", content="You are a helpful assistant."),
ChatMessage(role="user", content="What is hybrid search?"),
]
response = llm.chat(messages)
print(response.message.content)Streaming
stream = llm.stream_complete("Write a poem about vector databases.")
for chunk in stream:
print(chunk.delta, end="", flush=True)stream = llm.stream_chat(messages)
for chunk in stream:
print(chunk.delta, end="", flush=True)Embeddings
Use TensorasEmbedding for document and query embedding:
from llama_index_tensoras import TensorasEmbedding
embed_model = TensorasEmbedding(
model="gte-large-en-v1.5",
api_key="tns_your_key_here", # or set TENSORAS_API_KEY
)
# Embed a single query
query_vector = embed_model.get_query_embedding("What is deep learning?")
print(f"Dimensions: {len(query_vector)}")
# Embed multiple texts
vectors = embed_model.get_text_embedding_batch([
"Deep learning is a subset of machine learning.",
"Neural networks have multiple layers.",
])
print(f"Embedded {len(vectors)} texts")Retriever
Use TensorasRetriever to retrieve from a Tensoras Knowledge Base:
from llama_index_tensoras import TensorasRetriever
retriever = TensorasRetriever(
knowledge_base_id="kb_a1b2c3d4",
api_key="tns_your_key_here", # or set TENSORAS_API_KEY
top_k=5,
)
nodes = retriever.retrieve("How do I reset my password?")
for node in nodes:
print(f"Score: {node.score:.3f}")
print(node.text[:200])
print()Query Engine
Build a query engine that retrieves from a Tensoras Knowledge Base and generates answers:
from llama_index_tensoras import TensorasLLM, TensorasRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
llm = TensorasLLM(model="llama-3.3-70b")
retriever = TensorasRetriever(knowledge_base_id="kb_a1b2c3d4", top_k=5)
query_engine = RetrieverQueryEngine.from_args(
retriever=retriever,
llm=llm,
)
response = query_engine.query("How do I reset my password?")
print(response.response)
# View source nodes
for node in response.source_nodes:
print(f" Source: {node.metadata.get('filename')}, Score: {node.score:.3f}")Index with Tensoras Embeddings
Use Tensoras embeddings when building a LlamaIndex VectorStoreIndex:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index_tensoras import TensorasLLM, TensorasEmbedding
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Build index with Tensoras embeddings
embed_model = TensorasEmbedding(model="gte-large-en-v1.5")
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model,
)
# Query with Tensoras LLM
llm = TensorasLLM(model="llama-3.3-70b")
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("Summarize the key points.")
print(response.response)Settings
You can set Tensoras as the default LLM and embedding model globally:
from llama_index.core import Settings
from llama_index_tensoras import TensorasLLM, TensorasEmbedding
Settings.llm = TensorasLLM(model="llama-3.3-70b")
Settings.embed_model = TensorasEmbedding(model="gte-large-en-v1.5")
# Now all LlamaIndex components use Tensoras by defaultChat Engine
Build a conversational chat engine:
from llama_index_tensoras import TensorasLLM, TensorasRetriever
from llama_index.core.chat_engine import CondensePlusContextChatEngine
llm = TensorasLLM(model="llama-3.3-70b")
retriever = TensorasRetriever(knowledge_base_id="kb_a1b2c3d4", top_k=5)
chat_engine = CondensePlusContextChatEngine.from_defaults(
retriever=retriever,
llm=llm,
)
response = chat_engine.chat("What products do you offer?")
print(response.response)
response = chat_engine.chat("Tell me more about the first one.")
print(response.response)Next Steps
- LangChain Integration — use Tensoras with LangChain
- RAG Overview — how Tensoras RAG works under the hood
- Python SDK — full SDK reference