RAG Overview
Retrieval-Augmented Generation (RAG) lets you ground model responses in your own data. Instead of relying solely on the model’s training knowledge, RAG retrieves relevant passages from your documents and includes them as context, producing answers that are more accurate, up-to-date, and verifiable.
Tensoras provides a fully managed RAG pipeline built into the platform — no need to wire together a separate vector database, chunking service, and retrieval layer.
How RAG Works on Tensoras
The Tensoras RAG pipeline follows four steps:
- Create a Knowledge Base — a container for your documents and their embeddings.
- Add Data Sources — connect files, S3 buckets, web pages, Confluence, Notion, Google Drive, or other sources.
- Automatic Ingestion — Tensoras chunks your documents, generates embeddings, and indexes them for hybrid search.
- Query — pass
knowledge_basesin your chat completions request. Tensoras retrieves relevant chunks, injects them into the prompt, and returns the model’s answer with citations.
Architecture
┌─────────────┐
│ Data Sources │ File upload, S3, GCS, Web crawl,
│ │ Confluence, Notion, Google Drive
└──────┬──────┘
│
▼
┌─────────────┐
│ Chunking │ Semantic, fixed-size, recursive, or hierarchical
└──────┬──────┘
│
▼
┌─────────────┐
│ Embedding │ Generate vector embeddings
└──────┬──────┘
│
▼
┌─────────────┐
│ Indexing │ Vector + keyword index (hybrid search)
└──────┬──────┘
│
▼
┌─────────────┐
│ Retrieval │ Hybrid search → Reranking → Context injection
└──────┬──────┘
│
▼
┌─────────────┐
│ LLM │ Generate answer with citations
└─────────────┘Supported Data Sources
| Connector | Description |
|---|---|
file_upload | Upload PDFs, DOCX, TXT, Markdown, HTML, and other files directly |
s3 | Connect an Amazon S3 bucket with prefix filtering |
gcs | Connect a Google Cloud Storage bucket |
web_crawl | Crawl a website starting from a URL with configurable depth |
confluence | Sync pages from Atlassian Confluence spaces |
notion | Sync pages from a Notion workspace |
google_drive | Sync files from Google Drive folders |
See Connectors for detailed configuration for each connector type.
Quick Example
1. Create a Knowledge Base
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
kb = client.knowledge_bases.create(
name="Product Docs",
description="Internal product documentation",
embedding_model="bge-large-en-v1.5",
chunking_strategy={
"type": "semantic",
},
)
print(kb.id) # "kb_abc123"2. Add a Data Source
data_source = client.knowledge_bases.data_sources.create(
knowledge_base_id=kb.id,
type="file_upload",
file_ids=["file_xyz789"], # uploaded via the Files API
)3. Query with RAG
Once ingestion completes, pass the knowledge base ID in your chat completions request:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "How do I configure SSO for my organization?"},
],
knowledge_bases=["kb_abc123"],
)
print(response.choices[0].message.content)
# Access citations
for citation in response.citations:
print(f"Source: {citation.source}")
print(f"Text: {citation.text}")
print(f"Score: {citation.score}")
print()Node.js
import Tensoras from "tensoras";
const client = new Tensoras({ apiKey: "tns_your_key_here" });
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [
{ role: "user", content: "How do I configure SSO for my organization?" },
],
knowledgeBases: ["kb_abc123"],
});
console.log(response.choices[0].message.content);
// Access citations
for (const citation of response.citations) {
console.log(`Source: ${citation.source}`);
console.log(`Text: ${citation.text}`);
console.log(`Score: ${citation.score}`);
}Citations
When you query with knowledge_bases, the response includes a citations array. Each citation identifies the source document, the relevant text chunk, and a relevance score. This lets you show users exactly where the information came from. See Citations for details.
Configuration Options
Tensoras gives you control over the key components of the RAG pipeline:
- Chunking strategy — choose semantic, fixed-size, recursive, or hierarchical chunking. See Chunking Strategies.
- Search type — hybrid (default), vector-only, or keyword-only. See Hybrid Search.
- Reranking — optional second-pass reranking for improved relevance. See Hybrid Search.
- Connectors — configure sync schedules and incremental updates for each data source. See Connectors.
Agentic RAG with the Responses API
For more complex retrieval workflows, use the Responses API. It runs a multi-turn agentic loop where the model decides when and how to search your Knowledge Bases, can issue multiple searches, and produces a final answer — all in a single request.
response = client.responses.create(
model="llama-3.3-70b",
input="Compare our Q3 and Q4 revenue numbers.",
instructions="Answer based only on the provided documents.",
tools=[{
"type": "file_search",
"file_search": {
"knowledge_base_ids": ["kb_finance_2024"],
"max_results": 10,
"rerank": True,
},
}],
max_turns=5,
)This is the recommended approach for agentic workflows that combine LLM reasoning with knowledge base retrieval.
Related
- Responses API — agentic multi-turn RAG with tool calling
- Chunking Strategies — configure how documents are split
- Hybrid Search — vector + keyword search with reranking
- Citations — source attribution in RAG responses
- Connectors — data source configuration and sync
- Knowledge Bases API — API reference for KB management
- Data Sources API — API reference for data source configuration
- Retrieval API — API reference for direct retrieval queries
- Ingestion Jobs API — track ingestion progress