Haystack

Use Tensoras.ai as the LLM and embeddings provider in Haystack pipelines.

Installation

pip install tensoras-haystack

This installs the Tensoras integration for Haystack. It requires haystack-ai>=2.0.

Authentication

Set your API key as an environment variable:

export TENSORAS_API_KEY="tns_your_key_here"

Or pass it directly to each component.

Generator

Use TensorasGenerator for text generation in Haystack pipelines:

from tensoras_haystack import TensorasGenerator
 
generator = TensorasGenerator(
    model="llama-3.3-70b",
    api_key="tns_your_key_here",  # or set TENSORAS_API_KEY
    generation_kwargs={
        "temperature": 0.7,
        "max_tokens": 512,
    },
)
 
result = generator.run(prompt="Explain RAG in one sentence.")
print(result["replies"][0])

Chat Generator

Use TensorasChatGenerator for multi-turn conversations:

from tensoras_haystack import TensorasChatGenerator
from haystack.dataclasses import ChatMessage
 
chat_generator = TensorasChatGenerator(
    model="llama-3.3-70b",
    generation_kwargs={"temperature": 0.7},
)
 
messages = [
    ChatMessage.from_system("You are a helpful assistant."),
    ChatMessage.from_user("What is hybrid search?"),
]
 
result = chat_generator.run(messages=messages)
print(result["replies"][0].text)

Embedder

Document Embedder

Embed documents for indexing:

from tensoras_haystack import TensorasDocumentEmbedder
from haystack import Document
 
embedder = TensorasDocumentEmbedder(
    model="gte-large-en-v1.5",
    api_key="tns_your_key_here",  # or set TENSORAS_API_KEY
)
 
docs = [
    Document(content="Deep learning is a subset of machine learning."),
    Document(content="Neural networks have multiple layers."),
]
 
result = embedder.run(documents=docs)
 
for doc in result["documents"]:
    print(f"Embedding dimensions: {len(doc.embedding)}")

Text Embedder

Embed queries for retrieval:

from tensoras_haystack import TensorasTextEmbedder
 
text_embedder = TensorasTextEmbedder(
    model="gte-large-en-v1.5",
)
 
result = text_embedder.run(text="What is deep learning?")
print(f"Dimensions: {len(result['embedding'])}")

RAG Pipeline

Build a complete RAG pipeline with Tensoras components:

from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from tensoras_haystack import (
    TensorasGenerator,
    TensorasDocumentEmbedder,
    TensorasTextEmbedder,
)
from haystack import Document
 
# 1. Create document store and index documents
document_store = InMemoryDocumentStore()
 
docs = [
    Document(content="Tensoras provides serverless AI inference with built-in RAG."),
    Document(content="Knowledge Bases support hybrid search combining vector and keyword search."),
    Document(content="The rerank endpoint re-scores documents for better relevance."),
]
 
doc_embedder = TensorasDocumentEmbedder(model="gte-large-en-v1.5")
docs_with_embeddings = doc_embedder.run(documents=docs)
document_store.write_documents(docs_with_embeddings["documents"])
 
# 2. Build the RAG pipeline
prompt_template = """
Answer the question based on the following context.
 
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
 
Question: {{ query }}
 
Answer:
"""
 
pipeline = Pipeline()
pipeline.add_component("text_embedder", TensorasTextEmbedder(model="gte-large-en-v1.5"))
pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=3))
pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipeline.add_component("generator", TensorasGenerator(model="llama-3.3-70b"))
 
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "generator")
 
# 3. Run
result = pipeline.run({
    "text_embedder": {"text": "What search methods does Tensoras support?"},
    "prompt_builder": {"query": "What search methods does Tensoras support?"},
})
 
print(result["generator"]["replies"][0])

Streaming

Use the streaming callback for real-time output:

from tensoras_haystack import TensorasChatGenerator
from haystack.dataclasses import ChatMessage
 
def stream_callback(chunk):
    print(chunk.content, end="", flush=True)
 
chat_generator = TensorasChatGenerator(
    model="llama-3.3-70b",
    streaming_callback=stream_callback,
)
 
messages = [
    ChatMessage.from_user("Write a poem about vector databases."),
]
 
chat_generator.run(messages=messages)
print()

Serialization

All Tensoras Haystack components are fully serializable for pipeline persistence:

# Save pipeline
pipeline.dump("pipeline.yaml")
 
# Load pipeline
loaded_pipeline = Pipeline.load("pipeline.yaml")
result = loaded_pipeline.run({...})

Next Steps

LangChain Integration — use Tensoras with LangChain
LlamaIndex Integration — use Tensoras with LlamaIndex
RAG Overview — how Tensoras RAG works
Python SDK — full SDK reference

Vercel AI SDK DSPy