Magistral Small (24B): Mistral's Open-Source Reasoning Powerhouse with SFT+RL

Introduction

Magistral Small (24B) is Mistral AI’s open-source reasoning-focused language model with 24 billion parameters. Built on the foundation of the Mistral Small 3.1 model, it utilizes a specialized training regimen combining Supervised Fine-Tuning (SFT) traces from its larger sibling, Magistral Medium, with a custom Reinforcement Learning (RL) pipeline. This hybrid SFT+RL approach enhances its performance in tasks requiring long chains of logic, particularly in mathematics and coding.

TL;DR: Magistral Small (24B) is a highly efficient, 24-billion-parameter open-source model from Mistral AI, released under the Apache 2.0 License. Its standout feature is superior reasoning performance in math and code, achieved through a unique SFT combined with RL training pipeline. The model’s compact size allows for easy local deployment, potentially running on a single RTX 4090 or a 32GB RAM MacBook once quantized.

Introduction

Magistral Small (24B), released by Mistral AI in June 2025, marks the company’s first model explicitly focused on complex, domain-specific reasoning capabilities [1.3, 2.1]. Built on the foundation of the Mistral Small 3.1 model, the 24-billion-parameter model utilizes a specialized training regimen combining Supervised Fine-Tuning (SFT) traces from its more powerful sibling, Magistral Medium, with a custom Reinforcement Learning (RL) pipeline [1.4, 1.8]. This hybrid SFT+RL approach elevates its performance in tasks requiring long chains of logic, particularly in mathematics and coding.

TL;DR: Magistral Small (24B) is a highly efficient, 24-billion-parameter open-source model from Mistral AI, released under the Apache 2.0 License. Its standout feature is superior reasoning performance in math and code, achieved through a unique SFT combined with RL training pipeline. The model’s compact size allows for easy local deployment, potentially running on a single RTX 4090 or a 32GB RAM MacBook once quantized [1.4].

Technical Architecture and Training Methodology

The design of Magistral Small is centered around maximizing logical tracing and transparency, which is a key requirement for enterprise-grade reasoning applications.

SFT and Scalable RL Pipeline

Mistral AI adopted a ground-up approach to training, demonstrating the effectiveness of their custom, scalable RL pipeline.

SFT from Traces: The model’s initial fine-tuning (SFT) was conducted using reasoning traces specifically derived from the training of the larger, enterprise-focused Magistral Medium model. This technique effectively distilled high-quality reasoning capability into the smaller model.
Pure RL on Text: Following SFT, the model underwent an RL phase using text data. Research indicates that this approach maintains or even improves multimodal understanding and function calling, demonstrating that pure RL can significantly refine small models beyond the SFT baseline.
Transparent Reasoning: The model is trained to generate responses that start and end with <think> and </think> tags, providing a traceable thought process for verification and interpretability.

Deployment and Accessibility

The model is released under the Apache 2.0 License, granting users the freedom to use and modify it for both commercial and non-commercial projects. Its parameter count (24B) is optimally sized for efficient deployment.

Characteristic	Specification	Implication
Parameters	24 Billion	Efficient scale for reasoning tasks
License	Apache 2.0	Enables unrestricted commercial and non-commercial use
Context Window	128k Tokens	Supports extensive context processing
Local Deployment	Quantized versions fit on a single RTX 4090 GPU	High-performance inference is accessible on consumer-grade hardware

Why it matters: The SFT+RL hybrid training method is the foundational reason for the model’s high performance in complex tasks, offering a path for smaller models to achieve reasoning quality comparable to much larger or proprietary alternatives. The open-source license and efficient sizing make this high-tier performance widely accessible.

Performance in Logic-Heavy Domains: Math and Code

Magistral Small is specifically designed to excel in structured, multi-step logical challenges, such as those found in advanced mathematics and programming.

Benchmark Highlights

The model demonstrates strong performance on reasoning-focused benchmarks:

Benchmark	Performance Area	Notes
AIME2024	Math Competition	Strong performance on advanced mathematical reasoning
HumanEval	Code Generation	Competitive code generation capabilities
Math	Mathematical Reasoning	Solid capability in mathematical problem-solving

The performance boost observed when combining SFT and RL, particularly on mathematical benchmarks, suggests that the RL stage is effective at refining the model’s ability to execute long, correct reasoning chains.

Enterprise Use Cases

Its core strengths position Magistral Small ideally for several high-value enterprise applications:

Software Development: Designed for programmatic logic, project planning, and backend architecture design through sequenced, multi-step actions.
Regulated Industries (Legal, Finance): Provides traceable, verifiable thought processes essential for compliance and auditability in high-stakes environments.
Business Strategy: Capable of executing complex risk assessments, financial modeling, and operational optimization tasks involving multiple constraints.

Why it matters: By prioritizing verifiable, multi-step reasoning, Magistral Small is designed as a specialized tool for solving complex, structured problems where output accuracy and the ability to audit the intermediate steps are important requirements.

Conclusion

Magistral Small (24B) represents an advancement for smaller, openly available LLMs in the realm of complex reasoning. Through a targeted SFT+RL training process, it aims to deliver strong performance in domains like math and code. Its Apache 2.0 license and ability to be deployed efficiently on widely available hardware make it accessible to the global developer community.

Summary

The model is Mistral AI’s 24B open-source (Apache 2.0) offering, focused on transparent, multi-step reasoning
Training uses a blend of SFT (leveraging traces from Magistral Medium) and a custom RL pipeline
Demonstrates strong performance in math and coding tasks, validating the hybrid training approach
Its compact size and open license enable efficient, local deployment on hardware like the RTX 4090

Recommended Hashtags

#AI #OpenSourceAI #LLM #MagistralSmall #MistralAI #ReasoningModel #CodeAI #LLMOps #Apache20

References

“Magistral: Reasoning Models with Better Thinking” | arXiv | 2024
“Magistral - Mistral AI” | Mistral AI | 2024
“mistralai/Magistral-Small-2506” | Hugging Face | 2024

Introduction#

Introduction#

Technical Architecture and Training Methodology#

SFT and Scalable RL Pipeline#

Deployment and Accessibility#

Performance in Logic-Heavy Domains: Math and Code#

Benchmark Highlights#

Enterprise Use Cases#

Conclusion#

Summary#

Recommended Hashtags#

References#

Introduction

Introduction

Technical Architecture and Training Methodology

SFT and Scalable RL Pipeline

Deployment and Accessibility

Performance in Logic-Heavy Domains: Math and Code

Benchmark Highlights

Enterprise Use Cases

Conclusion

Summary

Recommended Hashtags

References