Responses
Create a response using the agentic tool-calling loop. The server runs a multi-turn loop where the model can issue tool calls (e.g., file_search against knowledge bases), the server executes built-in tools, feeds results back, and the model produces a final text response.
This is the recommended endpoint for agentic workflows that combine LLM reasoning with knowledge base retrieval.
Endpoints
POST https://api.tensoras.ai/v1/responses
GET https://api.tensoras.ai/v1/responses/{response_id}Authentication
Include your API key in the Authorization header:
Authorization: Bearer tns_your_key_hereCreate a Response
Responses are stored for 24 hours. After this window, GET /v1/responses/{id} returns 404.
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The model to use (e.g. llama-3.3-70b). |
input | string or array | Yes | — | A text prompt or list of input messages. See input format below. |
instructions | string | No | — | System instructions prepended to the conversation. |
tools | array | No | — | Tool definitions. See tools below. |
tool_choice | string or object | No | "auto" | Controls which tools are called: "auto", "none", "required". |
max_output_tokens | integer | No | Model default | Maximum output tokens per LLM call. |
temperature | number | No | — | Sampling temperature between 0 and 2. |
top_p | number | No | — | Nucleus sampling parameter. |
stream | boolean | No | false | If true, emit server-sent events during execution. |
max_turns | integer | No | 10 | Maximum agentic turns (1–50). Each tool call + response counts as one turn. |
metadata | object | No | — | Arbitrary key-value metadata attached to the response. |
user | string | No | — | End-user identifier for tracking. |
Input Format
String input — simple text prompt:
{
"model": "llama-3.3-70b",
"input": "What is machine learning?"
}Message list input — multi-turn conversation:
{
"model": "llama-3.3-70b",
"input": [
{ "role": "system", "content": "You are a helpful research assistant." },
{ "role": "user", "content": "Summarize the latest findings on RAG." }
]
}Each message has the same format as chat completion messages.
Tools
file_search
Automatically searches your knowledge bases and feeds results back to the model:
{
"type": "file_search",
"file_search": {
"knowledge_base_ids": ["kb_abc123", "kb_def456"],
"max_results": 10,
"search_type": "hybrid",
"score_threshold": 0.0,
"rerank": false
}
}| Field | Type | Default | Description |
|---|---|---|---|
knowledge_base_ids | array | — | Required. Knowledge base IDs to search. |
max_results | integer | 10 | Maximum results to retrieve (1–50). |
search_type | string | "hybrid" | One of "semantic", "hybrid", "keyword". |
score_threshold | number | 0.0 | Minimum relevance score (0–1). |
rerank | boolean | false | Whether to rerank results. |
rerank_model | string | — | Reranking model to use. |
function
Define custom functions for the model to call. The server records the call but does not execute it — custom functions are returned in the output for client-side handling.
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" }
},
"required": ["city"]
}
}
}Response Body
{
"id": "resp_abc123def456",
"object": "response",
"created_at": 1709123456,
"model": "llama-3.3-70b",
"status": "completed",
"output": [
{
"type": "file_search_call",
"id": "fs_abc123",
"queries": ["machine learning fundamentals"],
"results": [
{
"kb_id": "kb_abc123",
"document_id": "doc_123",
"document_name": "ml-overview.pdf",
"text": "Machine learning is a subset of artificial intelligence...",
"score": 0.95
}
],
"status": "completed"
},
{
"type": "message",
"id": "msg_xyz789",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Machine learning is a subset of artificial intelligence that enables systems to learn from data...",
"annotations": []
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 250,
"output_tokens": 120,
"total_tokens": 370
},
"metadata": null,
"tool_call_count": 1,
"turn_count": 2
}| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (prefixed with resp_). |
object | string | Always "response". |
created_at | integer | Unix timestamp of creation. |
model | string | The model used. |
status | string | "completed" — finished normally. "incomplete" — hit max_turns or max_output_tokens before finishing. "failed" — an internal error occurred. |
output | array | Ordered list of output items. See output items. |
usage | object | Aggregated token usage across all turns. |
metadata | object | Metadata provided in the request, or null. |
tool_call_count | integer | Total number of tool calls made. |
turn_count | integer | Number of agentic turns executed. |
Output Items
The output array contains items of different types:
message — The model’s text response:
| Field | Type | Description |
|---|---|---|
type | string | "message" |
id | string | Unique message ID. |
role | string | "assistant" |
content | array | List of content parts, each with type, text, and annotations. |
status | string | "completed" or "incomplete". |
file_search_call — A knowledge base search executed by the server:
| Field | Type | Description |
|---|---|---|
type | string | "file_search_call" |
id | string | Unique search call ID. |
queries | array | The search queries used. |
results | array | Retrieved documents with kb_id, document_id, document_name, text, score. |
status | string | "completed" or "incomplete". |
function_call — A custom function call requested by the model:
| Field | Type | Description |
|---|---|---|
type | string | "function_call" |
id | string | Unique call ID. |
call_id | string | The tool call ID from the model. |
name | string | Function name. |
arguments | string | JSON-encoded arguments. |
status | string | "completed" or "incomplete". Never "failed". |
Important: Custom functions are not executed server-side. When a function_call item appears in the output, you must execute the function yourself and return the result. See Handling Function Call Outputs below.
Streaming
When stream is set to true, the response is delivered as server-sent events (SSE). Each event is a data: line with a JSON payload, and the event type is indicated by the type field in the payload. The stream terminates with data: [DONE].
SSE event types emitted in order:
| Event type | Description |
|---|---|
response.created | Initial response shell (status "incomplete"). Emitted immediately after the request is accepted. |
response.output_item.added | A new output item (tool call or message) has started. Includes the partial item. |
response.output_text.delta | A text content delta for the current message. The delta field contains the new text fragment. |
response.output_text.done | The full text for the current message is complete. |
response.output_item.done | An output item is fully complete. Includes the final item object. |
response.completed | Final event — the full response object with status: "completed". |
[DONE] | Stream termination sentinel. Not a JSON object. |
Handling Function Call Outputs
When the model issues a custom function call, the output contains a function_call item. Your client must inspect the output, execute the function, and — if you want the model to continue reasoning with the result — you can re-submit the conversation with the result added as a tool role message.
from tensoras import Tensoras
import json
client = Tensoras(api_key="tns_...")
# 1. Create a response with a custom function tool
response = client.responses.create(
model="llama-3.3-70b",
input="What is the weather in Paris?",
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}],
)
# 2. Inspect output for function_call items
for item in response.output:
if item["type"] == "function_call":
fn_name = item["name"]
fn_args = json.loads(item["arguments"])
call_id = item["call_id"]
# 3. Execute the function client-side
if fn_name == "get_weather":
result = {"temperature": "15°C", "condition": "cloudy"}
# 4. (Optional) Re-submit with the result for further model reasoning
follow_up = client.responses.create(
model="llama-3.3-70b",
input=[
{"role": "user", "content": "What is the weather in Paris?"},
{"role": "assistant", "tool_calls": [{"id": call_id, "type": "function",
"function": {"name": fn_name, "arguments": item["arguments"]}}]},
{"role": "tool", "tool_call_id": call_id, "content": json.dumps(result)},
],
)
for follow_item in follow_up.output:
if follow_item["type"] == "message":
print(follow_item["content"][0]["text"])Error Responses
All errors use the standard JSON error envelope:
400 Bad Request — invalid parameters or content policy violation:
{
"error": {
"message": "Field 'model' is required.",
"type": "invalid_request_error",
"param": "model",
"code": "missing_required_field"
}
}402 Payment Required — insufficient credits or billing issue:
{
"error": {
"message": "Insufficient credits. Please add credits to your account.",
"type": "billing_error",
"param": null,
"code": "insufficient_credits"
}
}429 Too Many Requests — rate limit exceeded:
{
"error": {
"message": "Rate limit exceeded. Please slow down your requests.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}Retrieve a Response
Retrieve a previously-created response by its ID. Responses are stored for 24 hours. After this window, the endpoint returns 404.
GET https://api.tensoras.ai/v1/responses/{response_id}Path Parameters
| Parameter | Type | Description |
|---|---|---|
response_id | string | The response ID (e.g. resp_abc123). |
Response
Returns the same response body as the create endpoint, or 404 if the response does not exist or has expired.
Examples
Simple text query
curl https://api.tensoras.ai/v1/responses \
-H "Authorization: Bearer $TENSORAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"input": "What is retrieval-augmented generation?"
}'RAG with file_search
curl https://api.tensoras.ai/v1/responses \
-H "Authorization: Bearer $TENSORAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"input": "Summarize our Q4 financial results.",
"instructions": "Answer based only on the provided documents.",
"tools": [
{
"type": "file_search",
"file_search": {
"knowledge_base_ids": ["kb_finance_2024"],
"max_results": 10,
"rerank": true
}
}
]
}'Python SDK
from tensoras import Tensoras
client = Tensoras(api_key="tns_...")
response = client.responses.create(
model="llama-3.3-70b",
input="What are the key findings in our research docs?",
tools=[{
"type": "file_search",
"file_search": {
"knowledge_base_ids": ["kb_research"],
},
}],
)
for item in response.output:
if item["type"] == "message":
print(item["content"][0]["text"])Node.js SDK
import Tensoras from "@tensoras/sdk";
const client = new Tensoras({ apiKey: "tns_..." });
const response = await client.responses.create({
model: "llama-3.3-70b",
input: "What are the key findings in our research docs?",
tools: [{
type: "file_search",
file_search: {
knowledge_base_ids: ["kb_research"],
},
}],
});
for (const item of response.output) {
if (item.type === "message") {
console.log(item.content[0].text);
}
}