Responses

Create a response using the agentic tool-calling loop. The server runs a multi-turn loop where the model can issue tool calls (e.g., file_search against knowledge bases), the server executes built-in tools, feeds results back, and the model produces a final text response.

This is the recommended endpoint for agentic workflows that combine LLM reasoning with knowledge base retrieval.

Endpoints

POST https://api.tensoras.ai/v1/responses
GET  https://api.tensoras.ai/v1/responses/{response_id}

Authentication

Include your API key in the Authorization header:

Authorization: Bearer tns_your_key_here

Create a Response

Responses are stored for 24 hours. After this window, GET /v1/responses/{id} returns 404.

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	The model to use (e.g. `llama-3.3-70b`).
`input`	string or array	Yes	—	A text prompt or list of input messages. See input format below.
`instructions`	string	No	—	System instructions prepended to the conversation.
`tools`	array	No	—	Tool definitions. See tools below.
`tool_choice`	string or object	No	`"auto"`	Controls which tools are called: `"auto"`, `"none"`, `"required"`.
`max_output_tokens`	integer	No	Model default	Maximum output tokens per LLM call.
`temperature`	number	No	—	Sampling temperature between 0 and 2.
`top_p`	number	No	—	Nucleus sampling parameter.
`stream`	boolean	No	`false`	If `true`, emit server-sent events during execution.
`max_turns`	integer	No	`10`	Maximum agentic turns (1–50). Each tool call + response counts as one turn.
`metadata`	object	No	—	Arbitrary key-value metadata attached to the response.
`user`	string	No	—	End-user identifier for tracking.

Input Format

String input — simple text prompt:

{
  "model": "llama-3.3-70b",
  "input": "What is machine learning?"
}

Message list input — multi-turn conversation:

{
  "model": "llama-3.3-70b",
  "input": [
    { "role": "system", "content": "You are a helpful research assistant." },
    { "role": "user", "content": "Summarize the latest findings on RAG." }
  ]
}

Each message has the same format as chat completion messages.

Tools

file_search

Automatically searches your knowledge bases and feeds results back to the model:

{
  "type": "file_search",
  "file_search": {
    "knowledge_base_ids": ["kb_abc123", "kb_def456"],
    "max_results": 10,
    "search_type": "hybrid",
    "score_threshold": 0.0,
    "rerank": false
  }
}

Field	Type	Default	Description
`knowledge_base_ids`	array	—	Required. Knowledge base IDs to search.
`max_results`	integer	`10`	Maximum results to retrieve (1–50).
`search_type`	string	`"hybrid"`	One of `"semantic"`, `"hybrid"`, `"keyword"`.
`score_threshold`	number	`0.0`	Minimum relevance score (0–1).
`rerank`	boolean	`false`	Whether to rerank results.
`rerank_model`	string	—	Reranking model to use.

function

Define custom functions for the model to call. The server records the call but does not execute it — custom functions are returned in the output for client-side handling.

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get the current weather for a city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": { "type": "string" }
      },
      "required": ["city"]
    }
  }
}

Response Body

{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1709123456,
  "model": "llama-3.3-70b",
  "status": "completed",
  "output": [
    {
      "type": "file_search_call",
      "id": "fs_abc123",
      "queries": ["machine learning fundamentals"],
      "results": [
        {
          "kb_id": "kb_abc123",
          "document_id": "doc_123",
          "document_name": "ml-overview.pdf",
          "text": "Machine learning is a subset of artificial intelligence...",
          "score": 0.95
        }
      ],
      "status": "completed"
    },
    {
      "type": "message",
      "id": "msg_xyz789",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Machine learning is a subset of artificial intelligence that enables systems to learn from data...",
          "annotations": []
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 250,
    "output_tokens": 120,
    "total_tokens": 370
  },
  "metadata": null,
  "tool_call_count": 1,
  "turn_count": 2
}

Field	Type	Description
`id`	string	Unique response identifier (prefixed with `resp_`).
`object`	string	Always `"response"`.
`created_at`	integer	Unix timestamp of creation.
`model`	string	The model used.
`status`	string	`"completed"` — finished normally. `"incomplete"` — hit `max_turns` or `max_output_tokens` before finishing. `"failed"` — an internal error occurred.
`output`	array	Ordered list of output items. See output items.
`usage`	object	Aggregated token usage across all turns.
`metadata`	object	Metadata provided in the request, or `null`.
`tool_call_count`	integer	Total number of tool calls made.
`turn_count`	integer	Number of agentic turns executed.

Output Items

The output array contains items of different types:

message — The model’s text response:

Field	Type	Description
`type`	string	`"message"`
`id`	string	Unique message ID.
`role`	string	`"assistant"`
`content`	array	List of content parts, each with `type`, `text`, and `annotations`.
`status`	string	`"completed"` or `"incomplete"`.

file_search_call — A knowledge base search executed by the server:

Field	Type	Description
`type`	string	`"file_search_call"`
`id`	string	Unique search call ID.
`queries`	array	The search queries used.
`results`	array	Retrieved documents with `kb_id`, `document_id`, `document_name`, `text`, `score`.
`status`	string	`"completed"` or `"incomplete"`.

function_call — A custom function call requested by the model:

Field	Type	Description
`type`	string	`"function_call"`
`id`	string	Unique call ID.
`call_id`	string	The tool call ID from the model.
`name`	string	Function name.
`arguments`	string	JSON-encoded arguments.
`status`	string	`"completed"` or `"incomplete"`. Never `"failed"`.

Important: Custom functions are not executed server-side. When a function_call item appears in the output, you must execute the function yourself and return the result. See Handling Function Call Outputs below.

Streaming

When stream is set to true, the response is delivered as server-sent events (SSE). Each event is a data: line with a JSON payload, and the event type is indicated by the type field in the payload. The stream terminates with data: [DONE].

SSE event types emitted in order:

Event type	Description
`response.created`	Initial response shell (status `"incomplete"`). Emitted immediately after the request is accepted.
`response.output_item.added`	A new output item (tool call or message) has started. Includes the partial item.
`response.output_text.delta`	A text content delta for the current message. The `delta` field contains the new text fragment.
`response.output_text.done`	The full text for the current message is complete.
`response.output_item.done`	An output item is fully complete. Includes the final item object.
`response.completed`	Final event — the full response object with `status: "completed"`.
`[DONE]`	Stream termination sentinel. Not a JSON object.

Handling Function Call Outputs

When the model issues a custom function call, the output contains a function_call item. Your client must inspect the output, execute the function, and — if you want the model to continue reasoning with the result — you can re-submit the conversation with the result added as a tool role message.

from tensoras import Tensoras
import json
 
client = Tensoras(api_key="tns_...")
 
# 1. Create a response with a custom function tool
response = client.responses.create(
    model="llama-3.3-70b",
    input="What is the weather in Paris?",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }],
)
 
# 2. Inspect output for function_call items
for item in response.output:
    if item["type"] == "function_call":
        fn_name = item["name"]
        fn_args = json.loads(item["arguments"])
        call_id = item["call_id"]
 
        # 3. Execute the function client-side
        if fn_name == "get_weather":
            result = {"temperature": "15°C", "condition": "cloudy"}
 
        # 4. (Optional) Re-submit with the result for further model reasoning
        follow_up = client.responses.create(
            model="llama-3.3-70b",
            input=[
                {"role": "user", "content": "What is the weather in Paris?"},
                {"role": "assistant", "tool_calls": [{"id": call_id, "type": "function",
                    "function": {"name": fn_name, "arguments": item["arguments"]}}]},
                {"role": "tool", "tool_call_id": call_id, "content": json.dumps(result)},
            ],
        )
        for follow_item in follow_up.output:
            if follow_item["type"] == "message":
                print(follow_item["content"][0]["text"])

Error Responses

All errors use the standard JSON error envelope:

400 Bad Request — invalid parameters or content policy violation:

{
  "error": {
    "message": "Field 'model' is required.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "missing_required_field"
  }
}

402 Payment Required — insufficient credits or billing issue:

{
  "error": {
    "message": "Insufficient credits. Please add credits to your account.",
    "type": "billing_error",
    "param": null,
    "code": "insufficient_credits"
  }
}

429 Too Many Requests — rate limit exceeded:

{
  "error": {
    "message": "Rate limit exceeded. Please slow down your requests.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Retrieve a Response

Retrieve a previously-created response by its ID. Responses are stored for 24 hours. After this window, the endpoint returns 404.

GET https://api.tensoras.ai/v1/responses/{response_id}

Path Parameters

Parameter	Type	Description
`response_id`	string	The response ID (e.g. `resp_abc123`).

Response

Returns the same response body as the create endpoint, or 404 if the response does not exist or has expired.

Examples

Simple text query

curl https://api.tensoras.ai/v1/responses \
  -H "Authorization: Bearer $TENSORAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "input": "What is retrieval-augmented generation?"
  }'

RAG with file_search

curl https://api.tensoras.ai/v1/responses \
  -H "Authorization: Bearer $TENSORAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "input": "Summarize our Q4 financial results.",
    "instructions": "Answer based only on the provided documents.",
    "tools": [
      {
        "type": "file_search",
        "file_search": {
          "knowledge_base_ids": ["kb_finance_2024"],
          "max_results": 10,
          "rerank": true
        }
      }
    ]
  }'

Python SDK

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_...")
 
response = client.responses.create(
    model="llama-3.3-70b",
    input="What are the key findings in our research docs?",
    tools=[{
        "type": "file_search",
        "file_search": {
            "knowledge_base_ids": ["kb_research"],
        },
    }],
)
 
for item in response.output:
    if item["type"] == "message":
        print(item["content"][0]["text"])

Node.js SDK

import Tensoras from "@tensoras/sdk";
 
const client = new Tensoras({ apiKey: "tns_..." });
 
const response = await client.responses.create({
  model: "llama-3.3-70b",
  input: "What are the key findings in our research docs?",
  tools: [{
    type: "file_search",
    file_search: {
      knowledge_base_ids: ["kb_research"],
    },
  }],
});
 
for (const item of response.output) {
  if (item.type === "message") {
    console.log(item.content[0].text);
  }
}

Chat Completions Completions (Legacy)