FeaturesRAG Overview

RAG Overview

Retrieval-Augmented Generation (RAG) lets you ground model responses in your own data. Instead of relying solely on the model’s training knowledge, RAG retrieves relevant passages from your documents and includes them as context, producing answers that are more accurate, up-to-date, and verifiable.

Tensoras provides a fully managed RAG pipeline built into the platform — no need to wire together a separate vector database, chunking service, and retrieval layer.

How RAG Works on Tensoras

The Tensoras RAG pipeline follows four steps:

  1. Create a Knowledge Base — a container for your documents and their embeddings.
  2. Add Data Sources — connect files, S3 buckets, web pages, Confluence, Notion, Google Drive, or other sources.
  3. Automatic Ingestion — Tensoras chunks your documents, generates embeddings, and indexes them for hybrid search.
  4. Query — pass knowledge_bases in your chat completions request. Tensoras retrieves relevant chunks, injects them into the prompt, and returns the model’s answer with citations.

Architecture

┌─────────────┐
│ Data Sources │  File upload, S3, GCS, Web crawl,
│             │  Confluence, Notion, Google Drive
└──────┬──────┘


┌─────────────┐
│  Chunking   │  Semantic, fixed-size, recursive, or hierarchical
└──────┬──────┘


┌─────────────┐
│  Embedding  │  Generate vector embeddings
└──────┬──────┘


┌─────────────┐
│  Indexing   │  Vector + keyword index (hybrid search)
└──────┬──────┘


┌─────────────┐
│  Retrieval  │  Hybrid search → Reranking → Context injection
└──────┬──────┘


┌─────────────┐
│     LLM     │  Generate answer with citations
└─────────────┘

Supported Data Sources

ConnectorDescription
file_uploadUpload PDFs, DOCX, TXT, Markdown, HTML, and other files directly
s3Connect an Amazon S3 bucket with prefix filtering
gcsConnect a Google Cloud Storage bucket
web_crawlCrawl a website starting from a URL with configurable depth
confluenceSync pages from Atlassian Confluence spaces
notionSync pages from a Notion workspace
google_driveSync files from Google Drive folders

See Connectors for detailed configuration for each connector type.

Quick Example

1. Create a Knowledge Base

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
kb = client.knowledge_bases.create(
    name="Product Docs",
    description="Internal product documentation",
    embedding_model="bge-large-en-v1.5",
    chunking_strategy={
        "type": "semantic",
    },
)
 
print(kb.id)  # "kb_abc123"

2. Add a Data Source

data_source = client.knowledge_bases.data_sources.create(
    knowledge_base_id=kb.id,
    type="file_upload",
    file_ids=["file_xyz789"],  # uploaded via the Files API
)

3. Query with RAG

Once ingestion completes, pass the knowledge base ID in your chat completions request:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "How do I configure SSO for my organization?"},
    ],
    knowledge_bases=["kb_abc123"],
)
 
print(response.choices[0].message.content)
 
# Access citations
for citation in response.citations:
    print(f"Source: {citation.source}")
    print(f"Text: {citation.text}")
    print(f"Score: {citation.score}")
    print()

Node.js

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "user", content: "How do I configure SSO for my organization?" },
  ],
  knowledgeBases: ["kb_abc123"],
});
 
console.log(response.choices[0].message.content);
 
// Access citations
for (const citation of response.citations) {
  console.log(`Source: ${citation.source}`);
  console.log(`Text: ${citation.text}`);
  console.log(`Score: ${citation.score}`);
}

Citations

When you query with knowledge_bases, the response includes a citations array. Each citation identifies the source document, the relevant text chunk, and a relevance score. This lets you show users exactly where the information came from. See Citations for details.

Configuration Options

Tensoras gives you control over the key components of the RAG pipeline:

  • Chunking strategy — choose semantic, fixed-size, recursive, or hierarchical chunking. See Chunking Strategies.
  • Search type — hybrid (default), vector-only, or keyword-only. See Hybrid Search.
  • Reranking — optional second-pass reranking for improved relevance. See Hybrid Search.
  • Connectors — configure sync schedules and incremental updates for each data source. See Connectors.

Agentic RAG with the Responses API

For more complex retrieval workflows, use the Responses API. It runs a multi-turn agentic loop where the model decides when and how to search your Knowledge Bases, can issue multiple searches, and produces a final answer — all in a single request.

response = client.responses.create(
    model="llama-3.3-70b",
    input="Compare our Q3 and Q4 revenue numbers.",
    instructions="Answer based only on the provided documents.",
    tools=[{
        "type": "file_search",
        "file_search": {
            "knowledge_base_ids": ["kb_finance_2024"],
            "max_results": 10,
            "rerank": True,
        },
    }],
    max_turns=5,
)

This is the recommended approach for agentic workflows that combine LLM reasoning with knowledge base retrieval.