Database Connector
The database connector lets you use a PostgreSQL or MySQL database as a data source for a Tensoras Knowledge Base. Rows from a table or query are extracted, chunked, embedded, and indexed so you can query them via RAG.
Beta: The database connector is currently in beta. The API and behavior may change. Please report any issues to support@tensoras.ai.
Overview
The database connector works by running a query against your database, extracting a text column from each row, and ingesting the results as documents into a Knowledge Base. Optional metadata columns are stored alongside each document for filtered retrieval.
This is useful for:
- Knowledge bases built from support tickets, product descriptions, or internal wikis stored in a database
- RAG over structured data that changes frequently
- Combining database content with other data sources (S3, file uploads) in a single Knowledge Base
Supported Databases
| Database | Versions | Connection Protocol |
|---|---|---|
| PostgreSQL | 12+ | postgresql:// |
| MySQL | 8.0+ | mysql:// |
Setup
Step 1: Create a Knowledge Base
If you do not already have a Knowledge Base, create one:
from tensoras import Tensoras
client = Tensoras()
kb = client.knowledge_bases.create(
name="support-tickets",
description="Customer support tickets from the database",
)Step 2: Add the Database Data Source
Provide a connection string, the query or table to extract data from, and column mappings:
data_source = client.knowledge_bases.data_sources.create(
knowledge_base_id=kb.id,
type="database",
config={
"connection_string": "postgresql://user:password@db.example.com:5432/mydb",
"query": "SELECT id, title, body, category, created_at FROM support_tickets WHERE status = 'resolved'",
"text_column": "body",
"title_column": "title", # optional, used as document title
"metadata_columns": ["category", "created_at"], # optional, stored as filterable metadata
"id_column": "id", # optional, used for incremental sync
},
)
print(data_source.id)const dataSource = await client.knowledgeBases.dataSources.create({
knowledgeBaseId: kb.id,
type: "database",
config: {
connectionString: "postgresql://user:password@db.example.com:5432/mydb",
query: "SELECT id, title, body, category, created_at FROM support_tickets WHERE status = 'resolved'",
textColumn: "body",
titleColumn: "title", // optional, used as document title
metadataColumns: ["category", "created_at"], // optional, stored as filterable metadata
idColumn: "id", // optional, used for incremental sync
},
});
console.log(dataSource.id);Configuration Reference
| Field | Required | Description |
|---|---|---|
connection_string | Yes | Database connection URI (postgresql://... or mysql://...) |
query | Yes | SQL query to execute. Can be a SELECT statement or a table name. |
text_column | Yes | Column containing the main text content to embed and search |
title_column | No | Column to use as the document title (shown in citations) |
metadata_columns | No | Array of column names to store as filterable metadata |
id_column | No | Column with a unique row identifier, used for incremental sync |
Security: Connection strings are encrypted at rest. Use a read-only database user with access only to the tables you need (see Database User Permissions).
Using a Table Name Instead of a Query
If you want to ingest all rows from a table, you can pass the table name directly:
data_source = client.knowledge_bases.data_sources.create(
knowledge_base_id=kb.id,
type="database",
config={
"connection_string": "postgresql://user:password@db.example.com:5432/mydb",
"query": "articles", # table name instead of SELECT statement
"text_column": "content",
"title_column": "title",
"id_column": "id",
},
)This is equivalent to SELECT * FROM articles.
Step 3: Trigger a Sync
job = client.knowledge_bases.data_sources.sync(
knowledge_base_id=kb.id,
data_source_id=data_source.id,
)
print(f"Job {job.id}: {job.status}")const job = await client.knowledgeBases.dataSources.sync({
knowledgeBaseId: kb.id,
dataSourceId: dataSource.id,
});
console.log(`Job ${job.id}: ${job.status}`);Step 4: Query with RAG
Once ingestion is complete, use the Knowledge Base in chat completions. You can filter by metadata:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "How do I handle a refund request?"},
],
knowledge_bases=[kb.id],
)
print(response.choices[0].message.content)Sync Schedule
Set up a recurring sync to keep your Knowledge Base in sync with your database:
client.knowledge_bases.data_sources.update(
knowledge_base_id=kb.id,
data_source_id=data_source.id,
sync_schedule="0 */4 * * *", # every 4 hours
)await client.knowledgeBases.dataSources.update({
knowledgeBaseId: kb.id,
dataSourceId: dataSource.id,
syncSchedule: "0 */4 * * *", // every 4 hours
});Incremental Sync
If you provide an id_column, the connector tracks which rows have been ingested. On subsequent syncs:
- New rows (IDs not previously seen) are ingested.
- Modified rows (if using a query with a
WHERE updated_at > last_syncclause) are re-ingested. - Deleted rows (IDs no longer returned by the query) are removed from the index.
Without an id_column, every sync performs a full re-ingestion.
Database User Permissions
Create a dedicated read-only database user for the Tensoras connector:
PostgreSQL
CREATE USER tensoras_reader WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE mydb TO tensoras_reader;
GRANT USAGE ON SCHEMA public TO tensoras_reader;
GRANT SELECT ON support_tickets TO tensoras_reader;MySQL
CREATE USER 'tensoras_reader'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT ON mydb.support_tickets TO 'tensoras_reader'@'%';
FLUSH PRIVILEGES;Best practice: Only grant
SELECTon the specific tables referenced in your query. Never use a database admin account for the connector.
Network Configuration
The Tensoras connector worker needs network access to your database. Options include:
- Public endpoint — if your database has a public IP, allow connections from Tensoras IP ranges (see Console > Settings > IP Allowlist for current ranges)
- VPC peering — available on Enterprise plans, peer your AWS VPC with the Tensoras VPC
- SSH tunnel — contact support@tensoras.ai for SSH tunnel configuration
Limitations (Beta)
- Maximum of 1 million rows per sync. For larger datasets, use a
WHEREclause to limit results or split across multiple data sources. - Queries must complete within 5 minutes. Optimize slow queries or add appropriate indexes.
- Binary columns (BLOB, BYTEA) are not supported.
- Only text-based content is indexed. Numeric and date columns can be stored as metadata for filtering but are not embedded.
Troubleshooting
| Problem | Solution |
|---|---|
| ”Connection refused” | Verify the database is reachable from the internet or that VPC peering is configured |
| ”Authentication failed” | Check the username, password, and database name in the connection string |
| ”Column not found” | Verify that text_column, title_column, and metadata_columns match the column names returned by your query |
| Sync is slow | Add an index on the id_column and limit the query with a WHERE clause |
| ”Query timeout” | Optimize the query or add database indexes; queries must complete within 5 minutes |
Next Steps
- S3 Connector — connect an S3 bucket as a data source
- RAG Best Practices — optimize retrieval quality
- Data Sources API — full API reference
- Connectors Overview — all available connector types