Data Sources

Add and manage data sources for a knowledge base. Data sources define where documents are ingested from. Tensoras supports multiple source types including file uploads, web crawling, cloud storage, and third-party integrations.

Endpoints

POST https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET  https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources
GET  https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}
DELETE https://api.tensoras.ai/v1/knowledge-bases/{kb_id}/data-sources/{ds_id}

Authentication

Authorization: Bearer tns_your_key_here

Create Data Source

Add a new data source to a knowledge base. Once created, an ingestion job is automatically started to process and index the documents from the source.

Request

POST /v1/knowledge-bases/{kb_id}/data-sources

Parameter	Type	Required	Description
`kb_id`	string	Yes	The knowledge base ID (path parameter).
`type`	string	Yes	The data source type. One of `"file_upload"`, `"web_crawl"`, `"s3"`, `"gcs"`, `"confluence"`, `"notion"`, `"google_drive"`.
`name`	string	No	A human-readable name for the data source.
`config`	object	Yes	Source-specific configuration. See source types below.

Source Types

`file_upload`

Ingest documents from uploaded files. Upload files first using the Files API with purpose: "knowledge-base".

{
  "type": "file_upload",
  "config": {
    "file_ids": ["file-abc123", "file-def456"]
  }
}

Config Field	Type	Required	Description
`file_ids`	array	Yes	An array of file IDs to ingest.

`web_crawl`

Crawl and ingest content from web pages.

{
  "type": "web_crawl",
  "config": {
    "urls": ["https://docs.example.com"],
    "max_depth": 3,
    "max_pages": 100,
    "include_patterns": ["https://docs.example.com/*"],
    "exclude_patterns": ["*/changelog*"]
  }
}

Config Field	Type	Required	Default	Description
`urls`	array	Yes	—	Seed URLs to start crawling from.
`max_depth`	integer	No	`3`	Maximum link depth to follow from seed URLs.
`max_pages`	integer	No	`100`	Maximum number of pages to crawl.
`include_patterns`	array	No	—	Glob patterns for URLs to include.
`exclude_patterns`	array	No	—	Glob patterns for URLs to exclude.

`s3`

Ingest documents from an Amazon S3 bucket.

{
  "type": "s3",
  "config": {
    "bucket": "my-docs-bucket",
    "prefix": "documents/",
    "region": "us-east-1",
    "credentials": {
      "access_key_id": "AKIA...",
      "secret_access_key": "..."
    }
  }
}

Config Field	Type	Required	Description
`bucket`	string	Yes	The S3 bucket name.
`prefix`	string	No	Object key prefix to filter files.
`region`	string	No	The AWS region. Defaults to `"us-east-1"`.
`credentials`	object	Yes	AWS credentials with `access_key_id` and `secret_access_key`.

`gcs`

Ingest documents from a Google Cloud Storage bucket.

{
  "type": "gcs",
  "config": {
    "bucket": "my-docs-bucket",
    "prefix": "documents/",
    "credentials": {
      "service_account_json": "..."
    }
  }
}

Config Field	Type	Required	Description
`bucket`	string	Yes	The GCS bucket name.
`prefix`	string	No	Object prefix to filter files.
`credentials`	object	Yes	GCS credentials with `service_account_json`.

`confluence`

Ingest pages from an Atlassian Confluence workspace.

{
  "type": "confluence",
  "config": {
    "url": "https://your-domain.atlassian.net",
    "space_keys": ["ENG", "PRODUCT"],
    "credentials": {
      "email": "user@example.com",
      "api_token": "..."
    }
  }
}

Config Field	Type	Required	Description
`url`	string	Yes	The Confluence instance URL.
`space_keys`	array	No	Specific space keys to ingest. Omit to ingest all accessible spaces.
`credentials`	object	Yes	Confluence credentials with `email` and `api_token`.

`notion`

Ingest pages from a Notion workspace.

{
  "type": "notion",
  "config": {
    "page_ids": ["page-id-1", "page-id-2"],
    "database_ids": ["db-id-1"],
    "credentials": {
      "integration_token": "secret_..."
    }
  }
}

Config Field	Type	Required	Description
`page_ids`	array	No	Specific Notion page IDs to ingest.
`database_ids`	array	No	Specific Notion database IDs to ingest.
`credentials`	object	Yes	Notion credentials with `integration_token`.

`google_drive`

Ingest documents from Google Drive.

{
  "type": "google_drive",
  "config": {
    "folder_ids": ["folder-id-1"],
    "include_shared_drives": true,
    "credentials": {
      "service_account_json": "..."
    }
  }
}

Config Field	Type	Required	Description
`folder_ids`	array	No	Specific Google Drive folder IDs. Omit to ingest all accessible files.
`include_shared_drives`	boolean	No	Whether to include shared drives. Default: `false`.
`credentials`	object	Yes	Google credentials with `service_account_json`.

Response Body

{
  "id": "ds_abc123",
  "object": "data_source",
  "knowledge_base_id": "kb_abc123",
  "type": "file_upload",
  "name": "Product PDFs",
  "config": {
    "file_ids": ["file-abc123", "file-def456"]
  },
  "status": "processing",
  "document_count": 0,
  "created_at": 1709123456,
  "updated_at": 1709123456,
  "last_synced_at": null
}

Field	Type	Description
`id`	string	The unique data source identifier.
`object`	string	Always `"data_source"`.
`knowledge_base_id`	string	The ID of the parent knowledge base.
`type`	string	The data source type.
`name`	string	The name of the data source.
`config`	object	The source-specific configuration (credentials are redacted).
`status`	string	The current status. One of `"processing"`, `"active"`, `"error"`.
`document_count`	integer	The number of documents ingested from this source.
`created_at`	integer	Unix timestamp of when the data source was created.
`updated_at`	integer	Unix timestamp of the last update.
`last_synced_at`	integer or null	Unix timestamp of the last successful sync.

List Data Sources

Retrieve all data sources for a knowledge base.

Request

GET /v1/knowledge-bases/{kb_id}/data-sources

Parameter	Type	Required	Description
`kb_id`	string	Yes	The knowledge base ID (path parameter).
`limit`	integer	No	Maximum number of results. Default: 20, max: 100.
`after`	string	No	Cursor for pagination.

Response Body

{
  "object": "list",
  "data": [
    {
      "id": "ds_abc123",
      "object": "data_source",
      "knowledge_base_id": "kb_abc123",
      "type": "file_upload",
      "name": "Product PDFs",
      "status": "active",
      "document_count": 15,
      "created_at": 1709123456,
      "last_synced_at": 1709145056
    }
  ],
  "has_more": false
}

Get Data Source

Retrieve details about a specific data source.

Request

GET /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}

Parameter	Type	Required	Description
`kb_id`	string	Yes	The knowledge base ID (path parameter).
`ds_id`	string	Yes	The data source ID (path parameter).

Response Body

Returns a single data source object (same schema as the create response).

Delete Data Source

Delete a data source and remove all documents that were ingested from it.

Request

DELETE /v1/knowledge-bases/{kb_id}/data-sources/{ds_id}

Parameter	Type	Required	Description
`kb_id`	string	Yes	The knowledge base ID (path parameter).
`ds_id`	string	Yes	The data source ID (path parameter).

Response Body

{
  "id": "ds_abc123",
  "object": "data_source",
  "deleted": true
}

Examples

Add Files to a Knowledge Base

curl

# Step 1: Upload files
curl https://api.tensoras.ai/v1/files \
  -H "Authorization: Bearer tns_your_key_here" \
  -F "file=@product-guide.pdf" \
  -F "purpose=knowledge-base"
 
# Step 2: Create data source
curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "type": "file_upload",
    "name": "Product Guide",
    "config": {
      "file_ids": ["file-abc123"]
    }
  }'

Python

import requests
 
API_BASE = "https://api.tensoras.ai/v1"
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": "Bearer tns_your_key_here",
}
 
# Create a file upload data source
response = requests.post(
    f"{API_BASE}/knowledge-bases/kb_abc123/data-sources",
    headers=HEADERS,
    json={
        "type": "file_upload",
        "name": "Product Guide",
        "config": {
            "file_ids": ["file-abc123"],
        },
    },
)
 
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer tns_your_key_here",
    },
    body: JSON.stringify({
      type: "file_upload",
      name: "Product Guide",
      config: {
        file_ids: ["file-abc123"],
      },
    }),
  }
);
 
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);

Add a Web Crawl Data Source

curl

curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer tns_your_key_here" \
  -d '{
    "type": "web_crawl",
    "name": "Documentation Site",
    "config": {
      "urls": ["https://docs.example.com"],
      "max_depth": 3,
      "max_pages": 200,
      "include_patterns": ["https://docs.example.com/guides/*"]
    }
  }'

Python

import requests
 
response = requests.post(
    "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer tns_your_key_here",
    },
    json={
        "type": "web_crawl",
        "name": "Documentation Site",
        "config": {
            "urls": ["https://docs.example.com"],
            "max_depth": 3,
            "max_pages": 200,
            "include_patterns": ["https://docs.example.com/guides/*"],
        },
    },
)
 
ds = response.json()
print(f"Data Source ID: {ds['id']}")
print(f"Status: {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer tns_your_key_here",
    },
    body: JSON.stringify({
      type: "web_crawl",
      name: "Documentation Site",
      config: {
        urls: ["https://docs.example.com"],
        max_depth: 3,
        max_pages: 200,
        include_patterns: ["https://docs.example.com/guides/*"],
      },
    }),
  }
);
 
const ds = await response.json();
console.log(`Data Source ID: ${ds.id}`);
console.log(`Status: ${ds.status}`);

List Data Sources

curl

curl https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources \
  -H "Authorization: Bearer tns_your_key_here"

Python

import requests
 
response = requests.get(
    "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
    headers={"Authorization": "Bearer tns_your_key_here"},
)
 
for ds in response.json()["data"]:
    print(f"{ds['id']}: {ds['name']} ({ds['type']}) - {ds['status']}")

Node.js

const response = await fetch(
  "https://api.tensoras.ai/v1/knowledge-bases/kb_abc123/data-sources",
  {
    headers: { Authorization: "Bearer tns_your_key_here" },
  }
);
 
const { data } = await response.json();
for (const ds of data) {
  console.log(`${ds.id}: ${ds.name} (${ds.type}) - ${ds.status}`);
}

Error Handling

{
  "error": {
    "message": "Invalid data source type: 'dropbox'. Supported types: file_upload, web_crawl, s3, gcs, confluence, notion, google_drive",
    "type": "invalid_request_error",
    "param": "type",
    "code": "invalid_value"
  }
}

Knowledge Bases Retrieval