AISA-AR-FunctionCall-Think

Reasoning-Augmented Arabic Structured Tool Calling

AISA-AR-FunctionCall-Think is a reasoning-enhanced variant of the Arabic function-calling model introduced in the AISA-AR-FunctionCall framework. The model generates an intermediate reasoning trace before invoking a tool, enabling transparent decision-making for Arabic agentic systems.

This model extends AISA-AR-FunctionCall-FT by introducing explicit reasoning supervision using <think> blocks prior to tool execution.

Model Overview

Field	Value
Model name	AISA-AR-FunctionCall-Think
Base model	AISA-AR-FunctionCall-FT
Architecture	Gemma 3 (FunctionGemma 270M)
Training method	LoRA reasoning fine-tuning
Primary task	Arabic reasoning-aware function calling

The model produces outputs in the following pattern:

<think>
reasoning about tool selection
</think>
<start_function_call>
call:tool_name{arguments}
</end_function_call>

This allows the system to expose the reasoning behind tool selection.

Key Capabilities

Reasoning-aware tool selection
Explicit decision traces for tool invocation
Improved argument extraction consistency
Interpretable structured execution

Supported domains:

Domain
Travel
Utilities
Islamic services
Weather
Healthcare
Banking & finance
E-commerce
Government services

Supported Arabic dialect groups:

Modern Standard Arabic (MSA)
Gulf
Egyptian
Levantine
Maghrebi

Training Dataset

Training uses a subset of the AISA-AR-FunctionCall dataset with reasoning annotations.

Property	Value
Dataset size	~12k reasoning-augmented samples
Dialect coverage	5 Arabic dialects
Domains	8 real-world domains
Tools	27 structured tools

Training Methodology

The reasoning model is trained by augmenting assistant outputs with explicit reasoning segments.

Training format:

<think>
tool selection reasoning
</think>
<start_function_call>
call:tool{arguments}
</end_function_call>

Reasoning supervision is enforced during inference by priming the model to begin its generation with <think>.

Training configuration:

Parameter	Value
Training type	LoRA fine-tuning
LoRA rank	64
Alpha	64
Dropout	0.05
Trainable parameters	~5.36%
Epochs	3
Learning rate	3e-6
Effective batch size	32
Optimizer	8-bit AdamW
Scheduler	Cosine

Additional training signals include negative tool examples to reduce hallucinated tool calls when no tool invocation is required.

Evaluation Results

Evaluation is performed on a strict reasoning evaluation subset.

Strict Evaluation (n = 240)

Metric	Score
Tool Call Rate	0.992
Think-Before-Call Rate	1.000
Function Name Accuracy	0.992
Argument F1	1.000
Decision Accuracy	0.992
Hallucination Rate	0.000

These results indicate that the model consistently performs reasoning before tool invocation and achieves near-perfect structured alignment within the evaluated subset.

Important Note on Format Validation

Standard function-call validators may classify reasoning outputs as parse failures because <think> tokens appear before the function call marker.

This does not indicate structural instability — it reflects a difference in serialization format. When reasoning segments are permitted, tool invocation correctness remains near-perfect.

Example Usage

User query:

ما حالة الطقس في الرياض اليوم؟

Model output:

<think>
المستخدم يريد معرفة حالة الطقس في مدينة الرياض، لذا يجب استخدام أداة get_weather.
</think>
<start_function_call>
call:get_weather{city:<escape>الرياض<escape>,days:1}
</end_function_call>