Hybrid Search
Tensoras Knowledge Bases use hybrid search by default, combining vector search (semantic similarity) with keyword search (BM25) to give you the best of both worlds. Semantic search captures meaning and intent, while keyword search catches exact terms and phrases that vector search might miss.
How It Works
When you query a Knowledge Base, Tensoras runs two searches in parallel:
- Vector search — Embeds your query and finds the most semantically similar chunks using cosine similarity.
- Keyword search (BM25) — Performs a traditional term-frequency-based search over the indexed text, matching exact words and phrases.
Results from both searches are merged using Reciprocal Rank Fusion (RRF), which produces a single ranked list that benefits from both signals.
Reciprocal Rank Fusion (RRF)
RRF is a simple, effective method for combining ranked lists. For each chunk, the fused score is:
RRF_score = Σ 1 / (k + rank_i)Where rank_i is the chunk’s rank in each search result list and k is a constant (default 60). Chunks that appear high in both lists get the highest fused scores. Chunks that appear in only one list still contribute, which is why hybrid search catches results that either method alone might miss.
Search Types
You can control the search mode via the search_type parameter when using the Retrieval API or when configuring Knowledge Base queries:
| Search Type | Description |
|---|---|
hybrid (default) | Vector + keyword search with RRF fusion |
vector | Semantic similarity only |
keyword | BM25 keyword matching only |
Python — Direct Retrieval
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
# Hybrid search (default)
results = client.retrieval.query(
knowledge_base_id="kb_abc123",
query="How do I set up single sign-on?",
search_type="hybrid",
top_k=5,
)
for result in results.chunks:
print(f"Score: {result.score:.4f}")
print(f"Source: {result.source}")
print(f"Text: {result.text[:200]}...")
print()Node.js — Direct Retrieval
import Tensoras from "tensoras";
const client = new Tensoras({ apiKey: "tns_your_key_here" });
const results = await client.retrieval.query({
knowledgeBaseId: "kb_abc123",
query: "How do I set up single sign-on?",
searchType: "hybrid",
topK: 5,
});
for (const result of results.chunks) {
console.log(`Score: ${result.score.toFixed(4)}`);
console.log(`Source: ${result.source}`);
console.log(`Text: ${result.text.slice(0, 200)}...`);
}Using with Chat Completions
When you pass knowledge_bases in a chat completions request, hybrid search runs automatically behind the scenes:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "What is our refund policy?"},
],
knowledge_bases=["kb_abc123"],
)When to Use Which Mode
| Mode | Best For |
|---|---|
| Hybrid (default) | Most queries — combines semantic understanding with exact matching |
| Vector | Conceptual or paraphrased queries where exact terms do not matter |
| Keyword | Queries with specific names, IDs, error codes, or exact phrases |
Recommendation: Start with the default hybrid mode. Switch to vector-only or keyword-only only if you find specific query patterns where one signal dominates.
Reranking
For improved relevance, Tensoras supports optional second-pass reranking. After the initial retrieval (hybrid, vector, or keyword), a cross-encoder reranker re-scores the top results by jointly attending to the query and each chunk.
Reranking is more computationally expensive than the initial retrieval but significantly improves precision, especially when the initial result set is noisy.
Enabling Reranking
results = client.retrieval.query(
knowledge_base_id="kb_abc123",
query="How do I configure SAML SSO?",
search_type="hybrid",
top_k=10,
rerank=True,
rerank_top_k=5, # return top 5 after reranking
)const results = await client.retrieval.query({
knowledgeBaseId: "kb_abc123",
query: "How do I configure SAML SSO?",
searchType: "hybrid",
topK: 10,
rerank: true,
rerankTopK: 5,
});| Parameter | Default | Description |
|---|---|---|
rerank | false | Enable second-pass reranking |
rerank_top_k | Same as top_k | Number of results to return after reranking |
Related
- RAG Overview — end-to-end RAG pipeline
- Chunking Strategies — how documents are split before indexing
- Citations — source attribution in RAG responses
- Retrieval API — full retrieval endpoint reference
- Rerank API — standalone reranking endpoint