Reasoning

Some models on Tensoras have built-in reasoning capabilities — they perform step-by-step thinking before producing a final answer. This improves accuracy on complex tasks like multi-step math, logic puzzles, code generation, and scientific analysis.

Reasoning Models

Model	Description
`qwen-3-32b`	Qwen 3 32B — strong reasoning across math, logic, and code
`deepseek-r1-distill-70b`	DeepSeek R1 Distill 70B — distilled reasoning model with chain-of-thought
`deepseek-r1-distill-8b`	DeepSeek R1 Distill 8B — lighter distilled model using `<think>` tag parsing; does not support the `thinking` config parameter

How It Works

Reasoning is automatic — you do not need to set any special parameters. When you send a request to a reasoning model, it internally generates a chain-of-thought before producing the final answer. The final response you receive is the polished answer after the model has finished reasoning.

There is nothing to configure. Just use a reasoning model and the model handles the rest.

Python Example

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
response = client.chat.completions.create(
    model="deepseek-r1-distill-70b",
    messages=[
        {
            "role": "user",
            "content": (
                "A farmer has 3 fields. The first field is twice the size of the second. "
                "The third field is 10 acres larger than the first. Together they total "
                "130 acres. How large is each field?"
            ),
        },
    ],
)
 
print(response.choices[0].message.content)

Output

Let me work through this step by step.

Let the second field be x acres.
- First field: 2x acres
- Third field: 2x + 10 acres

Setting up the equation:
x + 2x + (2x + 10) = 130
5x + 10 = 130
5x = 120
x = 24

So:
- Second field: 24 acres
- First field: 48 acres
- Third field: 58 acres

We can verify: 24 + 48 + 58 = 130 ✓

Node.js Example

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const response = await client.chat.completions.create({
  model: "qwen-3-32b",
  messages: [
    {
      role: "user",
      content:
        "I have 5 shirts, 4 pairs of pants, and 3 pairs of shoes. " +
        "If I must wear one of each, and I refuse to wear my red shirt " +
        "with my brown shoes, how many valid outfits can I make?",
    },
  ],
});
 
console.log(response.choices[0].message.content);

When to Use Reasoning Models

Reasoning models shine on tasks that benefit from deliberate, multi-step thinking:

Math and quantitative problems — word problems, algebra, combinatorics
Logic puzzles and constraints — scheduling, deduction, constraint satisfaction
Code generation and debugging — complex algorithms, tricky edge cases
Scientific analysis — multi-step derivations, data interpretation
Planning and strategy — breaking down complex tasks into steps

For straightforward tasks like summarization, translation, or simple Q&A, standard models like llama-3.3-70b are typically faster and equally capable.

Streaming with Reasoning Models

Reasoning models support streaming just like all other models. Set stream=True to receive tokens as they are generated. See Streaming for details.

Compatibility

Reasoning models are fully compatible with all other features:

Extended Thinking

Extended thinking lets the model show its chain-of-thought reasoning as structured content blocks. Enable it with the thinking parameter.

Enabling Thinking

from tensoras import Tensoras
 
client = Tensoras(api_key="tns_your_key_here")
 
response = client.chat.completions.create(
    model="deepseek-r1-distill-70b",
    messages=[{"role": "user", "content": "What is the sum of angles in a polygon with n sides?"}],
    thinking={"type": "enabled", "budget_tokens": 4096},
)
 
# Access the full content block list
for block in response.choices[0].message.content:
    if block["type"] == "thinking":
        print("Reasoning:", block["thinking"])
    elif block["type"] == "text":
        print("Answer:", block["text"])

import Tensoras from "tensoras";
 
const client = new Tensoras({ apiKey: "tns_your_key_here" });
 
const response = await client.chat.completions.create({
  model: "deepseek-r1-distill-70b",
  messages: [{ role: "user", content: "What is the sum of angles in a polygon with n sides?" }],
  thinking: { type: "enabled", budget_tokens: 4096 },
});
 
const content = response.choices[0].message.content;
if (Array.isArray(content)) {
  for (const block of content) {
    if (block.type === "thinking") {
      console.log("Reasoning:", block.thinking);
    } else if (block.type === "text") {
      console.log("Answer:", block.text);
    }
  }
}

Response Format

When thinking is enabled, message.content becomes an array of content blocks:

{type: "thinking", thinking: "..."} — the model’s reasoning process
{type: "text", text: "..."} — the final answer

response = client.chat.completions.create(
    model="deepseek-r1-distill-70b",
    messages=[{"role": "user", "content": "Solve: if 3x + 7 = 22, what is x?"}],
    thinking={"type": "enabled", "budget_tokens": 2048},
)
 
content = response.choices[0].message.content
for block in content:
    if block["type"] == "thinking":
        # Internal reasoning — typically hidden from end users
        reasoning = block["thinking"]
    elif block["type"] == "text":
        # Final answer shown to users
        answer = block["text"]

import type { ThinkingContentBlock, TextContentBlock } from "tensoras";
 
const response = await client.chat.completions.create({
  model: "deepseek-r1-distill-70b",
  messages: [{ role: "user", content: "Solve: if 3x + 7 = 22, what is x?" }],
  thinking: { type: "enabled", budget_tokens: 2048 },
});
 
const content = response.choices[0].message.content;
if (Array.isArray(content)) {
  const thinkingBlocks = content.filter(
    (b): b is ThinkingContentBlock => b.type === "thinking",
  );
  const textBlocks = content.filter(
    (b): b is TextContentBlock => b.type === "text",
  );
}

Token Budget

budget_tokens controls how many tokens the model can use for thinking. The valid range is 1024–16384. Values outside this range are clamped at the gateway. Higher budgets produce more thorough reasoning but are slower and costlier.

Budget	Use When
1024	Simple problems needing brief reasoning
4096	Standard complex reasoning
16384	Maximum depth problems

Billing

Thinking tokens are billed at 50% of the standard output token rate, since they represent intermediate computation rather than final output. For example, 1000 thinking tokens cost the same as 500 output tokens. They appear in usage.completion_tokens_details.thinking_tokens.

response = client.chat.completions.create(
    model="deepseek-r1-distill-70b",
    messages=[{"role": "user", "content": "Hard problem"}],
    thinking={"type": "enabled", "budget_tokens": 8192},
)
 
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
if usage.completion_tokens_details:
    print(f"Thinking tokens: {usage.completion_tokens_details.thinking_tokens}")
    # Thinking tokens are billed at half the output token rate

Supported Models

Extended thinking (the thinking config parameter) works with the following reasoning-capable models:

deepseek-r1-distill-70b
qwen-3-32b (with reasoning mode)

Note: deepseek-r1-distill-8b is not supported for extended thinking via the thinking parameter. That model uses <think> tag parsing internally and does not expose structured thinking blocks.

Streaming with Extended Thinking

When extended thinking is enabled alongside stream=True, each streamed chunk’s delta includes a content_type field indicating whether the delta is part of a thinking block or the final text answer.

Python

with client.chat.completions.stream(
    model="deepseek-r1-distill-70b",
    messages=[{"role": "user", "content": "Solve: what is 17 × 24?"}],
    thinking={"type": "enabled", "budget_tokens": 2048},
) as stream:
    thinking_chunks = []
    text_chunks = []
    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content_type == "thinking" and delta.content:
            thinking_chunks.append(delta.content)
        elif delta.content_type == "text" and delta.content:
            text_chunks.append(delta.content)
 
    thinking = "".join(thinking_chunks)
    answer = "".join(text_chunks)
    print("Reasoning:", thinking)
    print("Answer:", answer)

TypeScript

const stream = await client.chat.completions.stream({
  model: "deepseek-r1-distill-70b",
  messages: [{ role: "user", content: "Solve: what is 17 × 24?" }],
  thinking: { type: "enabled", budget_tokens: 2048 },
});
 
const thinkingChunks: string[] = [];
const textChunks: string[] = [];
 
for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta.content_type === "thinking" && delta.content) {
    thinkingChunks.push(delta.content);
  } else if (delta.content_type === "text" && delta.content) {
    textChunks.push(delta.content);
  }
}
 
const thinking = thinkingChunks.join("");
const answer = textChunks.join("");
console.log("Reasoning:", thinking);
console.log("Answer:", answer);

Stripping Thinking Blocks Before Display

In most production UIs you will want to display only the final answer text to users, not the raw reasoning. Use this pattern to extract just the text content from a non-streaming response:

# Strip thinking blocks — show only final answer text to users
text_content = " ".join(b.text for b in response.choices[0].message.content if b.type == "text")
print(text_content)

Best Practices

Hide thinking blocks from end users in production UIs — they are intermediate computation
Use thinking for complex multi-step problems: math, logic, code reasoning, planning
Start with a budget_tokens of 4096 and adjust based on response quality
Plain string responses are still returned when thinking is disabled or not set

Streaming — stream reasoning model responses in real time
Chat Completions API — full endpoint reference
Models API — list available models

Structured Outputs Prompt Caching