Hybrid Search

Tensoras Knowledge Bases use hybrid search by default, combining vector search (semantic similarity) with keyword search (BM25) to give you the best of both worlds. Semantic search captures meaning and intent, while keyword search catches exact terms and phrases that vector search might miss.

How It Works

When you query a Knowledge Base, Tensoras runs two searches in parallel:

Vector search — Embeds your query and finds the most semantically similar chunks using cosine similarity.
Keyword search (BM25) — Performs a traditional term-frequency-based search over the indexed text, matching exact words and phrases.

Results from both searches are merged using Reciprocal Rank Fusion (RRF), which produces a single ranked list that benefits from both signals.

Reciprocal Rank Fusion (RRF)

RRF is a simple, effective method for combining ranked lists. For each chunk, the fused score is:

RRF_score = Σ 1 / (k + rank_i)

Where rank_i is the chunk’s rank in each search result list and k is a constant (default 60). Chunks that appear high in both lists get the highest fused scores. Chunks that appear in only one list still contribute, which is why hybrid search catches results that either method alone might miss.

Search Types

You can control the search mode via the search_type parameter when using the Retrieval API or when configuring Knowledge Base queries:

Search Type	Description
`hybrid` (default)	Vector + keyword search with RRF fusion
`vector`	Semantic similarity only
`keyword`	BM25 keyword matching only

Python — Direct Retrieval

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
# Hybrid search (default)
results = client.retrieval.query(
    knowledge_base_id="kb_abc123",
    query="How do I set up single sign-on?",
    search_type="hybrid",
    top_k=5,
)
 
for result in results.chunks:
    print(f"Score: {result.score:.4f}")
    print(f"Source: {result.source}")
    print(f"Text: {result.text[:200]}...")
    print()

Node.js — Direct Retrieval

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const results = await client.retrieval.query({
  knowledgeBaseId: "kb_abc123",
  query: "How do I set up single sign-on?",
  searchType: "hybrid",
  topK: 5,
});
 
for (const result of results.chunks) {
  console.log(`Score: ${result.score.toFixed(4)}`);
  console.log(`Source: ${result.source}`);
  console.log(`Text: ${result.text.slice(0, 200)}...`);
}

Using with Chat Completions

When you pass knowledge_bases in a chat completions request, hybrid search runs automatically behind the scenes:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "What is our refund policy?"},
    ],
    knowledge_bases=["kb_abc123"],
)

When to Use Which Mode

Mode	Best For
Hybrid (default)	Most queries — combines semantic understanding with exact matching
Vector	Conceptual or paraphrased queries where exact terms do not matter
Keyword	Queries with specific names, IDs, error codes, or exact phrases

Recommendation: Start with the default hybrid mode. Switch to vector-only or keyword-only only if you find specific query patterns where one signal dominates.

Reranking

For improved relevance, Tensoras supports optional second-pass reranking. After the initial retrieval (hybrid, vector, or keyword), a cross-encoder reranker re-scores the top results by jointly attending to the query and each chunk.

Reranking is more computationally expensive than the initial retrieval but significantly improves precision, especially when the initial result set is noisy.

Enabling Reranking

results = client.retrieval.query(
    knowledge_base_id="kb_abc123",
    query="How do I configure SAML SSO?",
    search_type="hybrid",
    top_k=10,
    rerank=True,
    rerank_top_k=5,  # return top 5 after reranking
)

const results = await client.retrieval.query({
  knowledgeBaseId: "kb_abc123",
  query: "How do I configure SAML SSO?",
  searchType: "hybrid",
  topK: 10,
  rerank: true,
  rerankTopK: 5,
});

Parameter	Default	Description
`rerank`	`false`	Enable second-pass reranking
`rerank_top_k`	Same as `top_k`	Number of results to return after reranking

RAG Overview — end-to-end RAG pipeline
Chunking Strategies — how documents are split before indexing
Citations — source attribution in RAG responses
Retrieval API — full retrieval endpoint reference
Rerank API — standalone reranking endpoint

Chunking Strategies Citations