Data Sources
Add and manage data sources for a knowledge base. Data sources define where documents are ingested from. Tensoras supports multiple source types including file uploads, web crawling, cloud storage, and third-party integrations.
Endpoints
POST https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}
DELETE https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}Authentication
Authorization: Bearer tns_your_key_hereCreate Data Source
Add a new data source to a knowledge base. Once created, an ingestion job is automatically started to process and index the documents from the source.
Request
POST /v1/knowledge-bases/{kb_id}/data-sources| Parameter | Type | Required | Description |
|---|---|---|---|
kb_id | string | Yes | The knowledge base ID (path parameter). |
type | string | Yes | The data source type. One of "file_upload", "web_crawl", "s3", "gcs", "confluence", "notion", "google_drive". |
name | string | No | A human-readable name for the data source. |
config | object | Yes | Source-specific configuration. See source types below. |
Source Types
file_upload
Ingest documents from uploaded files. Upload files first using the Files API with purpose: "knowledge-base".
{
"type": "file_upload",
"config": {
"file_ids": ["file-abc123", "file-def456"]
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
file_ids | array | Yes | An array of file IDs to ingest. |
web_crawl
Crawl and ingest content from web pages.
{
"type": "web_crawl",
"config": {
"urls": ["https://docs.example.com"],
"max_depth": 3,
"max_pages": 100,
"include_patterns": ["https://docs.example.com/*"],
"exclude_patterns": ["*/changelog*"]
}
}| Config Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | — | Seed URLs to start crawling from. |
max_depth | integer | No | 3 | Maximum link depth to follow from seed URLs. |
max_pages | integer | No | 100 | Maximum number of pages to crawl. |
include_patterns | array | No | — | Glob patterns for URLs to include. |
exclude_patterns | array | No | — | Glob patterns for URLs to exclude. |
s3
Ingest documents from an Amazon S3 bucket.
{
"type": "s3",
"config": {
"bucket": "my-docs-bucket",
"prefix": "documents/",
"region": "us-east-1",
"credentials": {
"access_key_id": "AKIA...",
"secret_access_key": "..."
}
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | The S3 bucket name. |
prefix | string | No | Object key prefix to filter files. |
region | string | No | The AWS region. Defaults to "us-east-1". |
credentials | object | Yes | AWS credentials with access_key_id and secret_access_key. |
gcs
Ingest documents from a Google Cloud Storage bucket.
{
"type": "gcs",
"config": {
"bucket": "my-docs-bucket",
"prefix": "documents/",
"credentials": {
"service_account_json": "..."
}
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
bucket | string | Yes | The GCS bucket name. |
prefix | string | No | Object prefix to filter files. |
credentials | object | Yes | GCS credentials with service_account_json. |
confluence
Ingest pages from an Atlassian Confluence workspace.
{
"type": "confluence",
"config": {
"url": "https://your-domain.atlassian.net",
"space_keys": ["ENG", "PRODUCT"],
"credentials": {
"email": "user@example.com",
"api_token": "..."
}
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The Confluence instance URL. |
space_keys | array | No | Specific space keys to ingest. Omit to ingest all accessible spaces. |
credentials | object | Yes | Confluence credentials with email and api_token. |
notion
Ingest pages from a Notion workspace.
{
"type": "notion",
"config": {
"page_ids": ["page-id-1", "page-id-2"],
"database_ids": ["db-id-1"],
"credentials": {
"integration_token": "secret_..."
}
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
page_ids | array | No | Specific Notion page IDs to ingest. |
database_ids | array | No | Specific Notion database IDs to ingest. |
credentials | object | Yes | Notion credentials with integration_token. |
google_drive
Ingest documents from Google Drive.
{
"type": "google_drive",
"config": {
"folder_ids": ["folder-id-1"],
"include_shared_drives": true,
"credentials": {
"service_account_json": "..."
}
}
}| Config Field | Type | Required | Description |
|---|---|---|---|
folder_ids | array | No | Specific Google Drive folder IDs. Omit to ingest all accessible files. |
include_shared_drives | boolean | No | Whether to include shared drives. Default: false. |
credentials | object | Yes | Google credentials with service_account_json. |
Response Body
{
"id": "ds_abc123",
"object": "data_source",
"knowledge_base_id": "kb_abc123",
"type": "file_upload",
"name": "Product PDFs",
"config": {
"file_ids": ["file-abc123", "file-def456"]
},
"status": "processing",
"document_count": 0,
"created_at": 1709123456,
"updated_at": 1709123456,
"last_synced_at": null
}| Field | Type | Description |
|---|---|---|
id | string | The unique data source identifier. |
object | string | Always "data_source". |
knowledge_base_id | string | The ID of the parent knowledge base. |
type | string | The data source type. |
name | string | The name of the data source. |
config | object | The source-specific configuration (credentials are redacted). |
status | string | The current status. One of "processing", "active", "error". |
document_count | integer | The number of documents ingested from this source. |
created_at | integer | Unix timestamp of when the data source was created. |
updated_at | integer | Unix timestamp of the last update. |
last_synced_at | integer or null | Unix timestamp of the last successful sync. |
List Data Sources
Retrieve all data sources for a knowledge base.
Request
GET /v1/knowledge-bases/{kb_id}/data-sources| Parameter | Type | Required | Description |
|---|---|---|---|
kb_id | string | Yes | The knowledge base ID (path parameter). |
limit | integer | No | Maximum number of results. Default: 20, max: 100. |
after | string | No | Cursor for pagination. |
Response Body
{
"object": "list",
"data": [
{
"id": "ds_abc123",
"object": "data_source",
"knowledge_base_id": "kb_abc123",
"type": "file_upload",
"name": "Product PDFs",
"status": "active",
"document_count": 15,
"created_at": 1709123456,
"last_synced_at": 1709145056
}
],
"has_more": false
}Get Data Source
Retrieve details about a specific data source.
Request
GET /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}| Parameter | Type | Required | Description |
|---|---|---|---|
kb_id | string | Yes | The knowledge base ID (path parameter). |
ds_id | string | Yes | The data source ID (path parameter). |
Response Body
Returns a single data source object (same schema as the create response).
Delete Data Source
Delete a data source and remove all documents that were ingested from it.
Request
DELETE /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}| Parameter | Type | Required | Description |
|---|---|---|---|
kb_id | string | Yes | The knowledge base ID (path parameter). |
ds_id | string | Yes | The data source ID (path parameter). |
Response Body
{
"id": "ds_abc123",
"object": "data_source",
"deleted": true
}Examples
Add Files to a Knowledge Base
curl
# Step 1: Upload files
curl https://api.tensoras.ai/v1/files \
-H "Authorization: Bearer tns_your_key_here" \
-F "file=@product-guide.pdf" \
-F "purpose=knowledge-base"
# Step 2: Create data source
curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"type": "file_upload",
"name": "Product Guide",
"config": {
"file_ids": ["file-abc123"]
}
}'Python
import requests
API_BASE = "https://api.tensoras.ai/v1"
HEADERS = {
"Content-Type": "application/json",
"Authorization": "Bearer tns_your_key_here",
}
# Create a file upload data source
response = requests.post(
f"{API_BASE}/knowledge-bases/kb_abc123/data-sources",
headers=HEADERS,
json={
"type": "file_upload",
"name": "Product Guide",
"config": {
"file_ids": ["file-abc123"],
},
},
)
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")Node.js
const response = await fetch(
"https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer tns_your_key_here",
},
body: JSON.stringify({
type: "file_upload",
name: "Product Guide",
config: {
file_ids: ["file-abc123"],
},
}),
}
);
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);Add a Web Crawl Data Source
curl
curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
-H "Content-Type: application/json" \
-H "Authorization: Bearer tns_your_key_here" \
-d '{
"type": "web_crawl",
"name": "Documentation Site",
"config": {
"urls": ["https://docs.example.com"],
"max_depth": 3,
"max_pages": 200,
"include_patterns": ["https://docs.example.com/guides/*"]
}
}'Python
import requests
response = requests.post(
"https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer tns_your_key_here",
},
json={
"type": "web_crawl",
"name": "Documentation Site",
"config": {
"urls": ["https://docs.example.com"],
"max_depth": 3,
"max_pages": 200,
"include_patterns": ["https://docs.example.com/guides/*"],
},
},
)
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")Node.js
const response = await fetch(
"https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
{
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer tns_your_key_here",
},
body: JSON.stringify({
type: "web_crawl",
name: "Documentation Site",
config: {
urls: ["https://docs.example.com"],
max_depth: 3,
max_pages: 200,
include_patterns: ["https://docs.example.com/guides/*"],
},
}),
}
);
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);List Data Sources
curl
curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
-H "Authorization: Bearer tns_your_key_here"Python
import requests
response = requests.get(
"https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
headers={"Authorization": "Bearer tns_your_key_here"},
)
for ds in response.json()["data"]:
print(f"{ds['id']}: {ds['name']} ({ds['type']}) - {ds['status']}")Node.js
const response = await fetch(
"https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
{
headers: { Authorization: "Bearer tns_your_key_here" },
}
);
const { data } = await response.json();
for (const ds of data) {
console.log(`${ds.id}: ${ds.name} (${ds.type}) - ${ds.status}`);
}Error Handling
{
"error": {
"message": "Invalid data source type: 'dropbox'. Supported types: file_upload, web_crawl, s3, gcs, confluence, notion, google_drive",
"type": "invalid_request_error",
"param": "type",
"code": "invalid_value"
}
}