API ReferenceData Sources

Data Sources

Add and manage data sources for a knowledge base. Data sources define where documents are ingested from. Tensoras supports multiple source types including file uploads, web crawling, cloud storage, and third-party integrations.

Endpoints

POST https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET  https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET  https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}
DELETE https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}

Authentication

Authorization: Bearer tns_your_key_here

Create Data Source

Add a new data source to a knowledge base. Once created, an ingestion job is automatically started to process and index the documents from the source.

Request

POST /v1/knowledge-bases/{kb_id}/data-sources
ParameterTypeRequiredDescription
kb_idstringYesThe knowledge base ID (path parameter).
typestringYesThe data source type. One of "file_upload", "web_crawl", "s3", "gcs", "confluence", "notion", "google_drive".
namestringNoA human-readable name for the data source.
configobjectYesSource-specific configuration. See source types below.

Source Types

file_upload

Ingest documents from uploaded files. Upload files first using the Files API with purpose: "knowledge-base".

{
  "type": "file_upload",
  "config": {
    "file_ids": ["file-abc123", "file-def456"]
  }
}
Config FieldTypeRequiredDescription
file_idsarrayYesAn array of file IDs to ingest.

web_crawl

Crawl and ingest content from web pages.

{
  "type": "web_crawl",
  "config": {
    "urls": ["https://docs.example.com"],
    "max_depth": 3,
    "max_pages": 100,
    "include_patterns": ["https://docs.example.com/*"],
    "exclude_patterns": ["*/changelog*"]
  }
}
Config FieldTypeRequiredDefaultDescription
urlsarrayYesSeed URLs to start crawling from.
max_depthintegerNo3Maximum link depth to follow from seed URLs.
max_pagesintegerNo100Maximum number of pages to crawl.
include_patternsarrayNoGlob patterns for URLs to include.
exclude_patternsarrayNoGlob patterns for URLs to exclude.

s3

Ingest documents from an Amazon S3 bucket.

{
  "type": "s3",
  "config": {
    "bucket": "my-docs-bucket",
    "prefix": "documents/",
    "region": "us-east-1",
    "credentials": {
      "access_key_id": "AKIA...",
      "secret_access_key": "..."
    }
  }
}
Config FieldTypeRequiredDescription
bucketstringYesThe S3 bucket name.
prefixstringNoObject key prefix to filter files.
regionstringNoThe AWS region. Defaults to "us-east-1".
credentialsobjectYesAWS credentials with access_key_id and secret_access_key.

gcs

Ingest documents from a Google Cloud Storage bucket.

{
  "type": "gcs",
  "config": {
    "bucket": "my-docs-bucket",
    "prefix": "documents/",
    "credentials": {
      "service_account_json": "..."
    }
  }
}
Config FieldTypeRequiredDescription
bucketstringYesThe GCS bucket name.
prefixstringNoObject prefix to filter files.
credentialsobjectYesGCS credentials with service_account_json.

confluence

Ingest pages from an Atlassian Confluence workspace.

{
  "type": "confluence",
  "config": {
    "url": "https://your-domain.atlassian.net",
    "space_keys": ["ENG", "PRODUCT"],
    "credentials": {
      "email": "user@example.com",
      "api_token": "..."
    }
  }
}
Config FieldTypeRequiredDescription
urlstringYesThe Confluence instance URL.
space_keysarrayNoSpecific space keys to ingest. Omit to ingest all accessible spaces.
credentialsobjectYesConfluence credentials with email and api_token.

notion

Ingest pages from a Notion workspace.

{
  "type": "notion",
  "config": {
    "page_ids": ["page-id-1", "page-id-2"],
    "database_ids": ["db-id-1"],
    "credentials": {
      "integration_token": "secret_..."
    }
  }
}
Config FieldTypeRequiredDescription
page_idsarrayNoSpecific Notion page IDs to ingest.
database_idsarrayNoSpecific Notion database IDs to ingest.
credentialsobjectYesNotion credentials with integration_token.

google_drive

Ingest documents from Google Drive.

{
  "type": "google_drive",
  "config": {
    "folder_ids": ["folder-id-1"],
    "include_shared_drives": true,
    "credentials": {
      "service_account_json": "..."
    }
  }
}
Config FieldTypeRequiredDescription
folder_idsarrayNoSpecific Google Drive folder IDs. Omit to ingest all accessible files.
include_shared_drivesbooleanNoWhether to include shared drives. Default: false.
credentialsobjectYesGoogle credentials with service_account_json.

Response Body

{
  "id": "ds_abc123",
  "object": "data_source",
  "knowledge_base_id": "kb_abc123",
  "type": "file_upload",
  "name": "Product PDFs",
  "config": {
    "file_ids": ["file-abc123", "file-def456"]
  },
  "status": "processing",
  "document_count": 0,
  "created_at": 1709123456,
  "updated_at": 1709123456,
  "last_synced_at": null
}
FieldTypeDescription
idstringThe unique data source identifier.
objectstringAlways "data_source".
knowledge_base_idstringThe ID of the parent knowledge base.
typestringThe data source type.
namestringThe name of the data source.
configobjectThe source-specific configuration (credentials are redacted).
statusstringThe current status. One of "processing", "active", "error".
document_countintegerThe number of documents ingested from this source.
created_atintegerUnix timestamp of when the data source was created.
updated_atintegerUnix timestamp of the last update.
last_synced_atinteger or nullUnix timestamp of the last successful sync.

List Data Sources

Retrieve all data sources for a knowledge base.

Request

GET /v1/knowledge-bases/{kb_id}/data-sources
ParameterTypeRequiredDescription
kb_idstringYesThe knowledge base ID (path parameter).
limitintegerNoMaximum number of results. Default: 20, max: 100.
afterstringNoCursor for pagination.

Response Body

{
  "object": "list",
  "data": [
    {
      "id": "ds_abc123",
      "object": "data_source",
      "knowledge_base_id": "kb_abc123",
      "type": "file_upload",
      "name": "Product PDFs",
      "status": "active",
      "document_count": 15,
      "created_at": 1709123456,
      "last_synced_at": 1709145056
    }
  ],
  "has_more": false
}

Get Data Source

Retrieve details about a specific data source.

Request

GET /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}
ParameterTypeRequiredDescription
kb_idstringYesThe knowledge base ID (path parameter).
ds_idstringYesThe data source ID (path parameter).

Response Body

Returns a single data source object (same schema as the create response).


Delete Data Source

Delete a data source and remove all documents that were ingested from it.

Request

DELETE /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}
ParameterTypeRequiredDescription
kb_idstringYesThe knowledge base ID (path parameter).
ds_idstringYesThe data source ID (path parameter).

Response Body

{
  "id": "ds_abc123",
  "object": "data_source",
  "deleted": true
}

Examples

Add Files to a Knowledge Base

curl

# Step 1: Upload files
curl https://api.tensoras.ai/v1/files \
  -H "Authorization: Bearer tns_your_key_here" \
  -F "file=@product-guide.pdf" \
  -F "purpose=knowledge-base"
 
# Step 2: Create data source
curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "type": "file_upload",
    "name": "Product Guide",
    "config": {
      "file_ids": ["file-abc123"]
    }
  }'

Python

import requests
 
API_BASE = "https://api.tensoras.ai/v1"
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": "Bearer tns_your_key_here",
}
 
# Create a file upload data source
response = requests.post(
    f"{API_BASE}/knowledge-bases/kb_abc123/data-sources",
    headers=HEADERS,
    json={
        "type": "file_upload",
        "name": "Product Guide",
        "config": {
            "file_ids": ["file-abc123"],
        },
    },
)
 
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer tns_your_key_here",
    },
    body: JSON.stringify({
      type: "file_upload",
      name: "Product Guide",
      config: {
        file_ids: ["file-abc123"],
      },
    }),
  }
);
 
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);

Add a Web Crawl Data Source

curl

curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "type": "web_crawl",
    "name": "Documentation Site",
    "config": {
      "urls": ["https://docs.example.com"],
      "max_depth": 3,
      "max_pages": 200,
      "include_patterns": ["https://docs.example.com/guides/*"]
    }
  }'

Python

import requests
 
response = requests.post(
    "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer tns_your_key_here",
    },
    json={
        "type": "web_crawl",
        "name": "Documentation Site",
        "config": {
            "urls": ["https://docs.example.com"],
            "max_depth": 3,
            "max_pages": 200,
            "include_patterns": ["https://docs.example.com/guides/*"],
        },
    },
)
 
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer tns_your_key_here",
    },
    body: JSON.stringify({
      type: "web_crawl",
      name: "Documentation Site",
      config: {
        urls: ["https://docs.example.com"],
        max_depth: 3,
        max_pages: 200,
        include_patterns: ["https://docs.example.com/guides/*"],
      },
    }),
  }
);
 
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);

List Data Sources

curl

curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Authorization: Bearer tns_your_key_here"

Python

import requests
 
response = requests.get(
    "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
    headers={"Authorization": "Bearer tns_your_key_here"},
)
 
for ds in response.json()["data"]:
    print(f"{ds['id']}: {ds['name']} ({ds['type']}) - {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    headers: { Authorization: "Bearer tns_your_key_here" },
  }
);
 
const { data } = await response.json();
for (const ds of data) {
  console.log(`${ds.id}: ${ds.name} (${ds.type}) - ${ds.status}`);
}

Error Handling

{
  "error": {
    "message": "Invalid data source type: 'dropbox'. Supported types: file_upload, web_crawl, s3, gcs, confluence, notion, google_drive",
    "type": "invalid_request_error",
    "param": "type",
    "code": "invalid_value"
  }
}