Database Connector

The database connector lets you use a PostgreSQL or MySQL database as a data source for a Tensoras Knowledge Base. Rows from a table or query are extracted, chunked, embedded, and indexed so you can query them via RAG.

Beta: The database connector is currently in beta. The API and behavior may change. Please report any issues to support@tensoras.ai.

Overview

The database connector works by running a query against your database, extracting a text column from each row, and ingesting the results as documents into a Knowledge Base. Optional metadata columns are stored alongside each document for filtered retrieval.

This is useful for:

Knowledge bases built from support tickets, product descriptions, or internal wikis stored in a database
RAG over structured data that changes frequently
Combining database content with other data sources (S3, file uploads) in a single Knowledge Base

Supported Databases

Database	Versions	Connection Protocol
PostgreSQL	12+	`postgresql://`
MySQL	8.0+	`mysql://`

Setup

Step 1: Create a Knowledge Base

If you do not already have a Knowledge Base, create one:

from tensoras import Tensoras
 
client = Tensoras()
 
kb = client.knowledge_bases.create(
    name="support-tickets",
    description="Customer support tickets from the database",
)

Step 2: Add the Database Data Source

Provide a connection string, the query or table to extract data from, and column mappings:

data_source = client.knowledge_bases.data_sources.create(
    knowledge_base_id=kb.id,
    type="database",
    config={
        "connection_string": "postgresql://user:password@db.example.com:5432/mydb",
        "query": "SELECT id, title, body, category, created_at FROM support_tickets WHERE status = 'resolved'",
        "text_column": "body",
        "title_column": "title",              # optional, used as document title
        "metadata_columns": ["category", "created_at"],  # optional, stored as filterable metadata
        "id_column": "id",                    # optional, used for incremental sync
    },
)
 
print(data_source.id)

const dataSource = await client.knowledgeBases.dataSources.create({
  knowledgeBaseId: kb.id,
  type: "database",
  config: {
    connectionString: "postgresql://user:password@db.example.com:5432/mydb",
    query: "SELECT id, title, body, category, created_at FROM support_tickets WHERE status = 'resolved'",
    textColumn: "body",
    titleColumn: "title",              // optional, used as document title
    metadataColumns: ["category", "created_at"],  // optional, stored as filterable metadata
    idColumn: "id",                    // optional, used for incremental sync
  },
});
 
console.log(dataSource.id);

Configuration Reference

Field	Required	Description
`connection_string`	Yes	Database connection URI (`postgresql://...` or `mysql://...`)
`query`	Yes	SQL query to execute. Can be a `SELECT` statement or a table name.
`text_column`	Yes	Column containing the main text content to embed and search
`title_column`	No	Column to use as the document title (shown in citations)
`metadata_columns`	No	Array of column names to store as filterable metadata
`id_column`	No	Column with a unique row identifier, used for incremental sync

Security: Connection strings are encrypted at rest. Use a read-only database user with access only to the tables you need (see Database User Permissions).

Using a Table Name Instead of a Query

If you want to ingest all rows from a table, you can pass the table name directly:

data_source = client.knowledge_bases.data_sources.create(
    knowledge_base_id=kb.id,
    type="database",
    config={
        "connection_string": "postgresql://user:password@db.example.com:5432/mydb",
        "query": "articles",           # table name instead of SELECT statement
        "text_column": "content",
        "title_column": "title",
        "id_column": "id",
    },
)

This is equivalent to SELECT * FROM articles.

Step 3: Trigger a Sync

job = client.knowledge_bases.data_sources.sync(
    knowledge_base_id=kb.id,
    data_source_id=data_source.id,
)
 
print(f"Job {job.id}: {job.status}")

const job = await client.knowledgeBases.dataSources.sync({
  knowledgeBaseId: kb.id,
  dataSourceId: dataSource.id,
});
 
console.log(`Job ${job.id}: ${job.status}`);

Step 4: Query with RAG

Once ingestion is complete, use the Knowledge Base in chat completions. You can filter by metadata:

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "How do I handle a refund request?"},
    ],
    knowledge_bases=[kb.id],
)
 
print(response.choices[0].message.content)

Sync Schedule

Set up a recurring sync to keep your Knowledge Base in sync with your database:

client.knowledge_bases.data_sources.update(
    knowledge_base_id=kb.id,
    data_source_id=data_source.id,
    sync_schedule="0 */4 * * *",  # every 4 hours
)

await client.knowledgeBases.dataSources.update({
  knowledgeBaseId: kb.id,
  dataSourceId: dataSource.id,
  syncSchedule: "0 */4 * * *", // every 4 hours
});

Incremental Sync

If you provide an id_column, the connector tracks which rows have been ingested. On subsequent syncs:

New rows (IDs not previously seen) are ingested.
Modified rows (if using a query with a WHERE updated_at > last_sync clause) are re-ingested.
Deleted rows (IDs no longer returned by the query) are removed from the index.

Without an id_column, every sync performs a full re-ingestion.

Database User Permissions

Create a dedicated read-only database user for the Tensoras connector:

PostgreSQL

CREATE USER tensoras_reader WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE mydb TO tensoras_reader;
GRANT USAGE ON SCHEMA public TO tensoras_reader;
GRANT SELECT ON support_tickets TO tensoras_reader;

MySQL

CREATE USER 'tensoras_reader'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT ON mydb.support_tickets TO 'tensoras_reader'@'%';
FLUSH PRIVILEGES;

Best practice: Only grant SELECT on the specific tables referenced in your query. Never use a database admin account for the connector.

Network Configuration

The Tensoras connector worker needs network access to your database. Options include:

Public endpoint — if your database has a public IP, allow connections from Tensoras IP ranges (see Console > Settings > IP Allowlist for current ranges)
VPC peering — available on Enterprise plans, peer your AWS VPC with the Tensoras VPC
SSH tunnel — contact support@tensoras.ai for SSH tunnel configuration

Limitations (Beta)

Maximum of 1 million rows per sync. For larger datasets, use a WHERE clause to limit results or split across multiple data sources.
Queries must complete within 5 minutes. Optimize slow queries or add appropriate indexes.
Binary columns (BLOB, BYTEA) are not supported.
Only text-based content is indexed. Numeric and date columns can be stored as metadata for filtering but are not embedded.

Troubleshooting

Problem	Solution
”Connection refused”	Verify the database is reachable from the internet or that VPC peering is configured
”Authentication failed”	Check the username, password, and database name in the connection string
”Column not found”	Verify that `text_column`, `title_column`, and `metadata_columns` match the column names returned by your query
Sync is slow	Add an index on the `id_column` and limit the query with a `WHERE` clause
”Query timeout”	Optimize the query or add database indexes; queries must complete within 5 minutes

Next Steps

S3 Connector — connect an S3 bucket as a data source
RAG Best Practices — optimize retrieval quality
Data Sources API — full API reference
Connectors Overview — all available connector types

S3 Connector RAG Best Practices