RAG Overview

Retrieval-Augmented Generation (RAG) lets you ground model responses in your own data. Instead of relying solely on the model’s training knowledge, RAG retrieves relevant passages from your documents and includes them as context, producing answers that are more accurate, up-to-date, and verifiable.

Tensoras provides a fully managed RAG pipeline built into the platform — no need to wire together a separate vector database, chunking service, and retrieval layer.

How RAG Works on Tensoras

The Tensoras RAG pipeline follows four steps:

Create a Knowledge Base — a container for your documents and their embeddings.
Add Data Sources — connect files, S3 buckets, web pages, Confluence, Notion, Google Drive, or other sources.
Automatic Ingestion — Tensoras chunks your documents, generates embeddings, and indexes them for hybrid search.
Query — pass knowledge_bases in your chat completions request. Tensoras retrieves relevant chunks, injects them into the prompt, and returns the model’s answer with citations.

Architecture

┌─────────────┐
│ Data Sources │  File upload, S3, GCS, Web crawl,
│             │  Confluence, Notion, Google Drive
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Chunking   │  Semantic, fixed-size, recursive, or hierarchical
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Embedding  │  Generate vector embeddings
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Indexing   │  Vector + keyword index (hybrid search)
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Retrieval  │  Hybrid search → Reranking → Context injection
└──────┬──────┘
       │
       ▼
┌─────────────┐
│     LLM     │  Generate answer with citations
└─────────────┘

Supported Data Sources

Connector	Description
`file_upload`	Upload PDFs, DOCX, TXT, Markdown, HTML, and other files directly
`s3`	Connect an Amazon S3 bucket with prefix filtering
`gcs`	Connect a Google Cloud Storage bucket
`web_crawl`	Crawl a website starting from a URL with configurable depth
`confluence`	Sync pages from Atlassian Confluence spaces
`notion`	Sync pages from a Notion workspace
`google_drive`	Sync files from Google Drive folders

See Connectors for detailed configuration for each connector type.

Quick Example

1. Create a Knowledge Base

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
kb = client.knowledge_bases.create(
    name="Product Docs",
    description="Internal product documentation",
    embedding_model="bge-large-en-v1.5",
    chunking_strategy={
        "type": "semantic",
    },
)
 
print(kb.id)  # "kb_abc123"

2. Add a Data Source

data_source = client.knowledge_bases.data_sources.create(
    knowledge_base_id=kb.id,
    type="file_upload",
    file_ids=["file_xyz789"],  # uploaded via the Files API
)

3. Query with RAG

Once ingestion completes, pass the knowledge base ID in your chat completions request:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "How do I configure SSO for my organization?"},
    ],
    knowledge_bases=["kb_abc123"],
)
 
print(response.choices[0].message.content)
 
# Access citations
for citation in response.citations:
    print(f"Source: {citation.source}")
    print(f"Text: {citation.text}")
    print(f"Score: {citation.score}")
    print()

Node.js

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "user", content: "How do I configure SSO for my organization?" },
  ],
  knowledgeBases: ["kb_abc123"],
});
 
console.log(response.choices[0].message.content);
 
// Access citations
for (const citation of response.citations) {
  console.log(`Source: ${citation.source}`);
  console.log(`Text: ${citation.text}`);
  console.log(`Score: ${citation.score}`);
}

Citations

When you query with knowledge_bases, the response includes a citations array. Each citation identifies the source document, the relevant text chunk, and a relevance score. This lets you show users exactly where the information came from. See Citations for details.

Configuration Options

Tensoras gives you control over the key components of the RAG pipeline:

Chunking strategy — choose semantic, fixed-size, recursive, or hierarchical chunking. See Chunking Strategies.
Search type — hybrid (default), vector-only, or keyword-only. See Hybrid Search.
Reranking — optional second-pass reranking for improved relevance. See Hybrid Search.
Connectors — configure sync schedules and incremental updates for each data source. See Connectors.

Agentic RAG with the Responses API

For more complex retrieval workflows, use the Responses API. It runs a multi-turn agentic loop where the model decides when and how to search your Knowledge Bases, can issue multiple searches, and produces a final answer — all in a single request.

response = client.responses.create(
    model="llama-3.3-70b",
    input="Compare our Q3 and Q4 revenue numbers.",
    instructions="Answer based only on the provided documents.",
    tools=[{
        "type": "file_search",
        "file_search": {
            "knowledge_base_ids": ["kb_finance_2024"],
            "max_results": 10,
            "rerank": True,
        },
    }],
    max_turns=5,
)

This is the recommended approach for agentic workflows that combine LLM reasoning with knowledge base retrieval.

Responses API — agentic multi-turn RAG with tool calling
Chunking Strategies — configure how documents are split
Hybrid Search — vector + keyword search with reranking
Citations — source attribution in RAG responses
Connectors — data source configuration and sync
Knowledge Bases API — API reference for KB management
Data Sources API — API reference for data source configuration
Retrieval API — API reference for direct retrieval queries
Ingestion Jobs API — track ingestion progress

Content Moderation Chunking Strategies