Haystack
Use Tensoras.ai as the LLM and embeddings provider in Haystack pipelines.
Installation
pip install tensoras-haystackThis installs the Tensoras integration for Haystack. It requires haystack-ai>=2.0.
Authentication
Set your API key as an environment variable:
export TENSORAS_API_KEY="tns_your_key_here"Or pass it directly to each component.
Generator
Use TensorasGenerator for text generation in Haystack pipelines:
from tensoras_haystack import TensorasGenerator
generator = TensorasGenerator(
model="llama-3.3-70b",
api_key="tns_your_key_here", # or set TENSORAS_API_KEY
generation_kwargs={
"temperature": 0.7,
"max_tokens": 512,
},
)
result = generator.run(prompt="Explain RAG in one sentence.")
print(result["replies"][0])Chat Generator
Use TensorasChatGenerator for multi-turn conversations:
from tensoras_haystack import TensorasChatGenerator
from haystack.dataclasses import ChatMessage
chat_generator = TensorasChatGenerator(
model="llama-3.3-70b",
generation_kwargs={"temperature": 0.7},
)
messages = [
ChatMessage.from_system("You are a helpful assistant."),
ChatMessage.from_user("What is hybrid search?"),
]
result = chat_generator.run(messages=messages)
print(result["replies"][0].text)Embedder
Document Embedder
Embed documents for indexing:
from tensoras_haystack import TensorasDocumentEmbedder
from haystack import Document
embedder = TensorasDocumentEmbedder(
model="gte-large-en-v1.5",
api_key="tns_your_key_here", # or set TENSORAS_API_KEY
)
docs = [
Document(content="Deep learning is a subset of machine learning."),
Document(content="Neural networks have multiple layers."),
]
result = embedder.run(documents=docs)
for doc in result["documents"]:
print(f"Embedding dimensions: {len(doc.embedding)}")Text Embedder
Embed queries for retrieval:
from tensoras_haystack import TensorasTextEmbedder
text_embedder = TensorasTextEmbedder(
model="gte-large-en-v1.5",
)
result = text_embedder.run(text="What is deep learning?")
print(f"Dimensions: {len(result['embedding'])}")RAG Pipeline
Build a complete RAG pipeline with Tensoras components:
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from tensoras_haystack import (
TensorasGenerator,
TensorasDocumentEmbedder,
TensorasTextEmbedder,
)
from haystack import Document
# 1. Create document store and index documents
document_store = InMemoryDocumentStore()
docs = [
Document(content="Tensoras provides serverless AI inference with built-in RAG."),
Document(content="Knowledge Bases support hybrid search combining vector and keyword search."),
Document(content="The rerank endpoint re-scores documents for better relevance."),
]
doc_embedder = TensorasDocumentEmbedder(model="gte-large-en-v1.5")
docs_with_embeddings = doc_embedder.run(documents=docs)
document_store.write_documents(docs_with_embeddings["documents"])
# 2. Build the RAG pipeline
prompt_template = """
Answer the question based on the following context.
Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}
Question: {{ query }}
Answer:
"""
pipeline = Pipeline()
pipeline.add_component("text_embedder", TensorasTextEmbedder(model="gte-large-en-v1.5"))
pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=3))
pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
pipeline.add_component("generator", TensorasGenerator(model="llama-3.3-70b"))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "generator")
# 3. Run
result = pipeline.run({
"text_embedder": {"text": "What search methods does Tensoras support?"},
"prompt_builder": {"query": "What search methods does Tensoras support?"},
})
print(result["generator"]["replies"][0])Streaming
Use the streaming callback for real-time output:
from tensoras_haystack import TensorasChatGenerator
from haystack.dataclasses import ChatMessage
def stream_callback(chunk):
print(chunk.content, end="", flush=True)
chat_generator = TensorasChatGenerator(
model="llama-3.3-70b",
streaming_callback=stream_callback,
)
messages = [
ChatMessage.from_user("Write a poem about vector databases."),
]
chat_generator.run(messages=messages)
print()Serialization
All Tensoras Haystack components are fully serializable for pipeline persistence:
# Save pipeline
pipeline.dump("pipeline.yaml")
# Load pipeline
loaded_pipeline = Pipeline.load("pipeline.yaml")
result = loaded_pipeline.run({...})Next Steps
- LangChain Integration — use Tensoras with LangChain
- LlamaIndex Integration — use Tensoras with LlamaIndex
- RAG Overview — how Tensoras RAG works
- Python SDK — full SDK reference