API ReferenceResponses

Responses

Create a response using the agentic tool-calling loop. The server runs a multi-turn loop where the model can issue tool calls (e.g., file_search against knowledge bases), the server executes built-in tools, feeds results back, and the model produces a final text response.

This is the recommended endpoint for agentic workflows that combine LLM reasoning with knowledge base retrieval.

Endpoints

POST https://api.tensoras.ai/v1/responses
GET  https://api.tensoras.ai/v1/responses/{response_id}

Authentication

Include your API key in the Authorization header:

Authorization: Bearer tns_your_key_here

Create a Response

Responses are stored for 24 hours. After this window, GET /v1/responses/{id} returns 404.

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesThe model to use (e.g. llama-3.3-70b).
inputstring or arrayYesA text prompt or list of input messages. See input format below.
instructionsstringNoSystem instructions prepended to the conversation.
toolsarrayNoTool definitions. See tools below.
tool_choicestring or objectNo"auto"Controls which tools are called: "auto", "none", "required".
max_output_tokensintegerNoModel defaultMaximum output tokens per LLM call.
temperaturenumberNoSampling temperature between 0 and 2.
top_pnumberNoNucleus sampling parameter.
streambooleanNofalseIf true, emit server-sent events during execution.
max_turnsintegerNo10Maximum agentic turns (1–50). Each tool call + response counts as one turn.
metadataobjectNoArbitrary key-value metadata attached to the response.
userstringNoEnd-user identifier for tracking.

Input Format

String input — simple text prompt:

{
  "model": "llama-3.3-70b",
  "input": "What is machine learning?"
}

Message list input — multi-turn conversation:

{
  "model": "llama-3.3-70b",
  "input": [
    { "role": "system", "content": "You are a helpful research assistant." },
    { "role": "user", "content": "Summarize the latest findings on RAG." }
  ]
}

Each message has the same format as chat completion messages.

Tools

Automatically searches your knowledge bases and feeds results back to the model:

{
  "type": "file_search",
  "file_search": {
    "knowledge_base_ids": ["kb_abc123", "kb_def456"],
    "max_results": 10,
    "search_type": "hybrid",
    "score_threshold": 0.0,
    "rerank": false
  }
}
FieldTypeDefaultDescription
knowledge_base_idsarrayRequired. Knowledge base IDs to search.
max_resultsinteger10Maximum results to retrieve (1–50).
search_typestring"hybrid"One of "semantic", "hybrid", "keyword".
score_thresholdnumber0.0Minimum relevance score (0–1).
rerankbooleanfalseWhether to rerank results.
rerank_modelstringReranking model to use.

function

Define custom functions for the model to call. The server records the call but does not execute it — custom functions are returned in the output for client-side handling.

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get the current weather for a city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": { "type": "string" }
      },
      "required": ["city"]
    }
  }
}

Response Body

{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1709123456,
  "model": "llama-3.3-70b",
  "status": "completed",
  "output": [
    {
      "type": "file_search_call",
      "id": "fs_abc123",
      "queries": ["machine learning fundamentals"],
      "results": [
        {
          "kb_id": "kb_abc123",
          "document_id": "doc_123",
          "document_name": "ml-overview.pdf",
          "text": "Machine learning is a subset of artificial intelligence...",
          "score": 0.95
        }
      ],
      "status": "completed"
    },
    {
      "type": "message",
      "id": "msg_xyz789",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Machine learning is a subset of artificial intelligence that enables systems to learn from data...",
          "annotations": []
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 250,
    "output_tokens": 120,
    "total_tokens": 370
  },
  "metadata": null,
  "tool_call_count": 1,
  "turn_count": 2
}
FieldTypeDescription
idstringUnique response identifier (prefixed with resp_).
objectstringAlways "response".
created_atintegerUnix timestamp of creation.
modelstringThe model used.
statusstring"completed" — finished normally. "incomplete" — hit max_turns or max_output_tokens before finishing. "failed" — an internal error occurred.
outputarrayOrdered list of output items. See output items.
usageobjectAggregated token usage across all turns.
metadataobjectMetadata provided in the request, or null.
tool_call_countintegerTotal number of tool calls made.
turn_countintegerNumber of agentic turns executed.

Output Items

The output array contains items of different types:

message — The model’s text response:

FieldTypeDescription
typestring"message"
idstringUnique message ID.
rolestring"assistant"
contentarrayList of content parts, each with type, text, and annotations.
statusstring"completed" or "incomplete".

file_search_call — A knowledge base search executed by the server:

FieldTypeDescription
typestring"file_search_call"
idstringUnique search call ID.
queriesarrayThe search queries used.
resultsarrayRetrieved documents with kb_id, document_id, document_name, text, score.
statusstring"completed" or "incomplete".

function_call — A custom function call requested by the model:

FieldTypeDescription
typestring"function_call"
idstringUnique call ID.
call_idstringThe tool call ID from the model.
namestringFunction name.
argumentsstringJSON-encoded arguments.
statusstring"completed" or "incomplete". Never "failed".

Important: Custom functions are not executed server-side. When a function_call item appears in the output, you must execute the function yourself and return the result. See Handling Function Call Outputs below.

Streaming

When stream is set to true, the response is delivered as server-sent events (SSE). Each event is a data: line with a JSON payload, and the event type is indicated by the type field in the payload. The stream terminates with data: [DONE].

SSE event types emitted in order:

Event typeDescription
response.createdInitial response shell (status "incomplete"). Emitted immediately after the request is accepted.
response.output_item.addedA new output item (tool call or message) has started. Includes the partial item.
response.output_text.deltaA text content delta for the current message. The delta field contains the new text fragment.
response.output_text.doneThe full text for the current message is complete.
response.output_item.doneAn output item is fully complete. Includes the final item object.
response.completedFinal event — the full response object with status: "completed".
[DONE]Stream termination sentinel. Not a JSON object.

Handling Function Call Outputs

When the model issues a custom function call, the output contains a function_call item. Your client must inspect the output, execute the function, and — if you want the model to continue reasoning with the result — you can re-submit the conversation with the result added as a tool role message.

from tensoras import Tensoras
import json
 
client = Tensoras(api_key="tns_...")
 
# 1. Create a response with a custom function tool
response = client.responses.create(
    model="llama-3.3-70b",
    input="What is the weather in Paris?",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }],
)
 
# 2. Inspect output for function_call items
for item in response.output:
    if item["type"] == "function_call":
        fn_name = item["name"]
        fn_args = json.loads(item["arguments"])
        call_id = item["call_id"]
 
        # 3. Execute the function client-side
        if fn_name == "get_weather":
            result = {"temperature": "15°C", "condition": "cloudy"}
 
        # 4. (Optional) Re-submit with the result for further model reasoning
        follow_up = client.responses.create(
            model="llama-3.3-70b",
            input=[
                {"role": "user", "content": "What is the weather in Paris?"},
                {"role": "assistant", "tool_calls": [{"id": call_id, "type": "function",
                    "function": {"name": fn_name, "arguments": item["arguments"]}}]},
                {"role": "tool", "tool_call_id": call_id, "content": json.dumps(result)},
            ],
        )
        for follow_item in follow_up.output:
            if follow_item["type"] == "message":
                print(follow_item["content"][0]["text"])

Error Responses

All errors use the standard JSON error envelope:

400 Bad Request — invalid parameters or content policy violation:

{
  "error": {
    "message": "Field 'model' is required.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "missing_required_field"
  }
}

402 Payment Required — insufficient credits or billing issue:

{
  "error": {
    "message": "Insufficient credits. Please add credits to your account.",
    "type": "billing_error",
    "param": null,
    "code": "insufficient_credits"
  }
}

429 Too Many Requests — rate limit exceeded:

{
  "error": {
    "message": "Rate limit exceeded. Please slow down your requests.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Retrieve a Response

Retrieve a previously-created response by its ID. Responses are stored for 24 hours. After this window, the endpoint returns 404.

GET https://api.tensoras.ai/v1/responses/{response_id}

Path Parameters

ParameterTypeDescription
response_idstringThe response ID (e.g. resp_abc123).

Response

Returns the same response body as the create endpoint, or 404 if the response does not exist or has expired.

Examples

Simple text query

curl https://api.tensoras.ai/v1/responses \
  -H "Authorization: Bearer $TENSORAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "input": "What is retrieval-augmented generation?"
  }'
curl https://api.tensoras.ai/v1/responses \
  -H "Authorization: Bearer $TENSORAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "input": "Summarize our Q4 financial results.",
    "instructions": "Answer based only on the provided documents.",
    "tools": [
      {
        "type": "file_search",
        "file_search": {
          "knowledge_base_ids": ["kb_finance_2024"],
          "max_results": 10,
          "rerank": true
        }
      }
    ]
  }'

Python SDK

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_...")
 
response = client.responses.create(
    model="llama-3.3-70b",
    input="What are the key findings in our research docs?",
    tools=[{
        "type": "file_search",
        "file_search": {
            "knowledge_base_ids": ["kb_research"],
        },
    }],
)
 
for item in response.output:
    if item["type"] == "message":
        print(item["content"][0]["text"])

Node.js SDK

import Tensoras from "@tensoras/sdk";
 
const client = new Tensoras({ apiKey: "tns_..." });
 
const response = await client.responses.create({
  model: "llama-3.3-70b",
  input: "What are the key findings in our research docs?",
  tools: [{
    type: "file_search",
    file_search: {
      knowledge_base_ids: ["kb_research"],
    },
  }],
});
 
for (const item of response.output) {
  if (item.type === "message") {
    console.log(item.content[0].text);
  }
}