Reasoning
Some models on Tensoras have built-in reasoning capabilities — they perform step-by-step thinking before producing a final answer. This improves accuracy on complex tasks like multi-step math, logic puzzles, code generation, and scientific analysis.
Reasoning Models
| Model | Description |
|---|---|
qwen-3-32b | Qwen 3 32B — strong reasoning across math, logic, and code |
deepseek-r1-distill-70b | DeepSeek R1 Distill 70B — distilled reasoning model with chain-of-thought |
deepseek-r1-distill-8b | DeepSeek R1 Distill 8B — lighter distilled model using <think> tag parsing; does not support the thinking config parameter |
How It Works
Reasoning is automatic — you do not need to set any special parameters. When you send a request to a reasoning model, it internally generates a chain-of-thought before producing the final answer. The final response you receive is the polished answer after the model has finished reasoning.
There is nothing to configure. Just use a reasoning model and the model handles the rest.
Python Example
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
response = client.chat.completions.create(
model="deepseek-r1-distill-70b",
messages=[
{
"role": "user",
"content": (
"A farmer has 3 fields. The first field is twice the size of the second. "
"The third field is 10 acres larger than the first. Together they total "
"130 acres. How large is each field?"
),
},
],
)
print(response.choices[0].message.content)Let me work through this step by step.
Let the second field be x acres.
- First field: 2x acres
- Third field: 2x + 10 acres
Setting up the equation:
x + 2x + (2x + 10) = 130
5x + 10 = 130
5x = 120
x = 24
So:
- Second field: 24 acres
- First field: 48 acres
- Third field: 58 acres
We can verify: 24 + 48 + 58 = 130 ✓Node.js Example
import Tensoras from "tensoras";
const client = new Tensoras({ apiKey: "tns_your_key_here" });
const response = await client.chat.completions.create({
model: "qwen-3-32b",
messages: [
{
role: "user",
content:
"I have 5 shirts, 4 pairs of pants, and 3 pairs of shoes. " +
"If I must wear one of each, and I refuse to wear my red shirt " +
"with my brown shoes, how many valid outfits can I make?",
},
],
});
console.log(response.choices[0].message.content);When to Use Reasoning Models
Reasoning models shine on tasks that benefit from deliberate, multi-step thinking:
- Math and quantitative problems — word problems, algebra, combinatorics
- Logic puzzles and constraints — scheduling, deduction, constraint satisfaction
- Code generation and debugging — complex algorithms, tricky edge cases
- Scientific analysis — multi-step derivations, data interpretation
- Planning and strategy — breaking down complex tasks into steps
For straightforward tasks like summarization, translation, or simple Q&A, standard models like llama-3.3-70b are typically faster and equally capable.
Streaming with Reasoning Models
Reasoning models support streaming just like all other models. Set stream=True to receive tokens as they are generated. See Streaming for details.
Compatibility
Reasoning models are fully compatible with all other features:
Extended Thinking
Extended thinking lets the model show its chain-of-thought reasoning as structured content blocks. Enable it with the thinking parameter.
Enabling Thinking
from tensoras import Tensoras
client = Tensoras(api_key="tns_your_key_here")
response = client.chat.completions.create(
model="deepseek-r1-distill-70b",
messages=[{"role": "user", "content": "What is the sum of angles in a polygon with n sides?"}],
thinking={"type": "enabled", "budget_tokens": 4096},
)
# Access the full content block list
for block in response.choices[0].message.content:
if block["type"] == "thinking":
print("Reasoning:", block["thinking"])
elif block["type"] == "text":
print("Answer:", block["text"])import Tensoras from "tensoras";
const client = new Tensoras({ apiKey: "tns_your_key_here" });
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-70b",
messages: [{ role: "user", content: "What is the sum of angles in a polygon with n sides?" }],
thinking: { type: "enabled", budget_tokens: 4096 },
});
const content = response.choices[0].message.content;
if (Array.isArray(content)) {
for (const block of content) {
if (block.type === "thinking") {
console.log("Reasoning:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
}Response Format
When thinking is enabled, message.content becomes an array of content blocks:
{type: "thinking", thinking: "..."}— the model’s reasoning process{type: "text", text: "..."}— the final answer
response = client.chat.completions.create(
model="deepseek-r1-distill-70b",
messages=[{"role": "user", "content": "Solve: if 3x + 7 = 22, what is x?"}],
thinking={"type": "enabled", "budget_tokens": 2048},
)
content = response.choices[0].message.content
for block in content:
if block["type"] == "thinking":
# Internal reasoning — typically hidden from end users
reasoning = block["thinking"]
elif block["type"] == "text":
# Final answer shown to users
answer = block["text"]import type { ThinkingContentBlock, TextContentBlock } from "tensoras";
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-70b",
messages: [{ role: "user", content: "Solve: if 3x + 7 = 22, what is x?" }],
thinking: { type: "enabled", budget_tokens: 2048 },
});
const content = response.choices[0].message.content;
if (Array.isArray(content)) {
const thinkingBlocks = content.filter(
(b): b is ThinkingContentBlock => b.type === "thinking",
);
const textBlocks = content.filter(
(b): b is TextContentBlock => b.type === "text",
);
}Token Budget
budget_tokens controls how many tokens the model can use for thinking. The valid range is 1024–16384. Values outside this range are clamped at the gateway. Higher budgets produce more thorough reasoning but are slower and costlier.
| Budget | Use When |
|---|---|
| 1024 | Simple problems needing brief reasoning |
| 4096 | Standard complex reasoning |
| 16384 | Maximum depth problems |
Billing
Thinking tokens are billed at 50% of the standard output token rate, since they represent intermediate computation rather than final output. For example, 1000 thinking tokens cost the same as 500 output tokens. They appear in usage.completion_tokens_details.thinking_tokens.
response = client.chat.completions.create(
model="deepseek-r1-distill-70b",
messages=[{"role": "user", "content": "Hard problem"}],
thinking={"type": "enabled", "budget_tokens": 8192},
)
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
if usage.completion_tokens_details:
print(f"Thinking tokens: {usage.completion_tokens_details.thinking_tokens}")
# Thinking tokens are billed at half the output token rateSupported Models
Extended thinking (the thinking config parameter) works with the following reasoning-capable models:
deepseek-r1-distill-70bqwen-3-32b(with reasoning mode)
Note: deepseek-r1-distill-8b is not supported for extended thinking via the thinking parameter. That model uses <think> tag parsing internally and does not expose structured thinking blocks.
Streaming with Extended Thinking
When extended thinking is enabled alongside stream=True, each streamed chunk’s delta includes a content_type field indicating whether the delta is part of a thinking block or the final text answer.
Python
with client.chat.completions.stream(
model="deepseek-r1-distill-70b",
messages=[{"role": "user", "content": "Solve: what is 17 × 24?"}],
thinking={"type": "enabled", "budget_tokens": 2048},
) as stream:
thinking_chunks = []
text_chunks = []
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content_type == "thinking" and delta.content:
thinking_chunks.append(delta.content)
elif delta.content_type == "text" and delta.content:
text_chunks.append(delta.content)
thinking = "".join(thinking_chunks)
answer = "".join(text_chunks)
print("Reasoning:", thinking)
print("Answer:", answer)TypeScript
const stream = await client.chat.completions.stream({
model: "deepseek-r1-distill-70b",
messages: [{ role: "user", content: "Solve: what is 17 × 24?" }],
thinking: { type: "enabled", budget_tokens: 2048 },
});
const thinkingChunks: string[] = [];
const textChunks: string[] = [];
for await (const chunk of stream) {
const delta = chunk.choices[0].delta;
if (delta.content_type === "thinking" && delta.content) {
thinkingChunks.push(delta.content);
} else if (delta.content_type === "text" && delta.content) {
textChunks.push(delta.content);
}
}
const thinking = thinkingChunks.join("");
const answer = textChunks.join("");
console.log("Reasoning:", thinking);
console.log("Answer:", answer);Stripping Thinking Blocks Before Display
In most production UIs you will want to display only the final answer text to users, not the raw reasoning. Use this pattern to extract just the text content from a non-streaming response:
# Strip thinking blocks — show only final answer text to users
text_content = " ".join(b.text for b in response.choices[0].message.content if b.type == "text")
print(text_content)Best Practices
- Hide thinking blocks from end users in production UIs — they are intermediate computation
- Use thinking for complex multi-step problems: math, logic, code reasoning, planning
- Start with a
budget_tokensof 4096 and adjust based on response quality - Plain string responses are still returned when thinking is disabled or not set
Related
- Streaming — stream reasoning model responses in real time
- Chat Completions API — full endpoint reference
- Models API — list available models