Tool Calling

Tool calling (also known as function calling) lets the model invoke external functions you define. You describe available tools in your request, and the model can choose to call one or more of them instead of — or in addition to — generating a text response. Your code executes the function, returns the result, and the model uses it to produce a final answer.

Supported Models

Tool calling is supported on all chat completion models:

llama-3.3-70b
llama-3.1-8b
qwen-3-32b
mistral-7b-instruct
deepseek-r1-distill-70b
codestral-latest

Defining Tools

A tool definition includes a function object with a name, description, and a JSON Schema for parameters:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get the current weather for a given city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "The city name, e.g. 'San Francisco'"
        },
        "units": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit"
        }
      },
      "required": ["city"]
    }
  }
}

Tips for tool definitions:

Write clear, specific descriptions — the model uses them to decide when and how to call each tool.
Mark only truly required parameters as required.
Use enum to constrain values where possible.

Full Python Example

import json
from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
# 1. Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'San Francisco'",
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                    },
                },
                "required": ["city"],
            },
        },
    }
]
 
# 2. Your actual function implementation
def get_weather(city: str, units: str = "celsius") -> dict:
    # In a real app, call a weather API here
    return {"city": city, "temperature": 18, "units": units, "condition": "partly cloudy"}
 
# 3. Send the initial request with tools
messages = [
    {"role": "user", "content": "What's the weather like in San Francisco?"}
]
 
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=messages,
    tools=tools,
)
 
message = response.choices[0].message
 
# 4. Check if the model wants to call a tool
if message.tool_calls:
    # Append the assistant message (with tool_calls) to the conversation
    messages.append(message)
 
    for tool_call in message.tool_calls:
        # Parse arguments and call the function
        args = json.loads(tool_call.function.arguments)
        result = get_weather(**args)
 
        # Append the tool result
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })
 
    # 5. Send the conversation back so the model can produce a final answer
    final_response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=messages,
        tools=tools,
    )
 
    print(final_response.choices[0].message.content)
else:
    print(message.content)

Output

The weather in San Francisco is currently 18°C and partly cloudy.

Full Node.js Example

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
// 1. Define your tools
const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a given city.",
      parameters: {
        type: "object",
        properties: {
          city: {
            type: "string",
            description: "The city name, e.g. 'San Francisco'",
          },
          units: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit",
          },
        },
        required: ["city"],
      },
    },
  },
];
 
// 2. Your actual function implementation
function getWeather(city, units = "celsius") {
  return { city, temperature: 18, units, condition: "partly cloudy" };
}
 
// 3. Send the initial request with tools
const messages = [
  { role: "user", content: "What's the weather like in San Francisco?" },
];
 
const response = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages,
  tools,
});
 
const message = response.choices[0].message;
 
// 4. Check if the model wants to call a tool
if (message.tool_calls) {
  messages.push(message);
 
  for (const toolCall of message.tool_calls) {
    const args = JSON.parse(toolCall.function.arguments);
    const result = getWeather(args.city, args.units);
 
    messages.push({
      role: "tool",
      tool_call_id: toolCall.id,
      content: JSON.stringify(result),
    });
  }
 
  // 5. Send the conversation back for a final answer
  const finalResponse = await client.chat.completions.create({
    model: "llama-3.3-70b",
    messages,
    tools,
  });
 
  console.log(finalResponse.choices[0].message.content);
} else {
  console.log(message.content);
}

How the Conversation Flows

A tool-calling conversation typically follows this pattern:

You send messages + tool definitions.
Model responds with tool_calls (instead of or alongside text content).
You execute each function and append role: "tool" messages with the results.
Model uses the tool results to produce a final text response.

The model may call multiple tools in a single response. Each tool call has a unique id that you must reference when returning the result via tool_call_id.

Controlling Tool Use

You can guide the model’s tool-calling behavior with the tool_choice parameter:

Value	Behavior
`"auto"` (default)	Model decides whether to call a tool or respond with text
`"none"`	Model will not call any tools
`"required"`	Model must call at least one tool
`{"type": "function", "function": {"name": "get_weather"}}`	Model must call the specified tool

Streaming with Tool Calls

Tool calls work with streaming. When streaming, tool call arguments arrive incrementally in delta.tool_calls. See Streaming for more details.

Server-Side Tool Calling with the Responses API

If you want the server to execute tool calls automatically (e.g., searching your Knowledge Bases), use the Responses API instead. The server runs a multi-turn agentic loop where the model issues tool calls, the server executes built-in tools like file_search, and the model produces a final answer — all in a single request.

response = client.responses.create(
    model="llama-3.3-70b",
    input="What does our docs say about SSO configuration?",
    tools=[{
        "type": "file_search",
        "file_search": {
            "knowledge_base_ids": ["kb_abc123"],
        },
    }],
)

See the Responses API reference for details.

Responses API — server-side agentic tool-calling loop
Streaming — real-time streaming of tool call arguments
Structured Outputs — enforce JSON schemas on responses
Chat Completions API — full endpoint reference

Streaming Structured Outputs