Announcing new fine-tuning models and techniques in Azure AI Foundry

11 months ago 180

Today, we’re excited to denote 2 large enhancements to exemplary fine-tuning successful Azure AI Foundry—Reinforcement Fine-Tuning (RFT) with o4-mini, coming soon, and Supervised Fine-Tuning (SFT) for the 4.1-nano model, disposable now.

Today, we’re excited to denote 3 large enhancements to exemplary fine-tuning successful Azure AI Foundry—Reinforcement Fine-Tuning (RFT) with o4-mini (coming soon), Supervised Fine-Tuning (SFT) for the GPT-4.1-nano and Llama 4 Scout exemplary (available now). These updates bespeak our continued committedness to empowering organizations with tools to physique highly customized, domain-adapted AI systems for real-world impact.

With these caller models, we’re unblocking 2 large avenues of LLM customization: GPT-4.1-nano is simply a almighty tiny model, perfect for distillation, portion o4-mini is the archetypal reasoning exemplary you tin fine-tune, and Llama 4 Scout is simply a best-in-class unfastened root model.

Reinforcement Fine-Tuning with o4-mini

Reinforcement Fine-Tuning introduces a caller level of power for aligning exemplary behaviour with analyzable concern logic. By rewarding close reasoning and penalizing undesirable outputs, RFT improves exemplary decision-making successful dynamic oregon high-stakes environments.

Coming soon for the o4-mini model, RFT unlocks caller possibilities for usage cases requiring adaptive reasoning, contextual awareness, and domain-specific logic—all portion maintaining accelerated inference performance.

Real satellite impact: DraftWise

DraftWise, a ineligible tech startup, utilized reinforcement fine-tuning (RFT) successful Azure AI Foundry Models to heighten the show of reasoning models tailored for declaration procreation and review. Faced with the situation of delivering highly contextual, legally dependable suggestions to lawyers, DraftWise fine-tuned Azure OpenAI models utilizing proprietary ineligible information to amended effect accuracy and accommodate to nuanced idiosyncratic prompts. This led to a 30% betterment successful hunt effect quality, enabling lawyers to draught contracts faster and absorption connected high-value advisory work.

Reinforcement fine-tuning connected reasoning models is simply a imaginable crippled changer for us. It’s helping our models recognize the nuance of ineligible connection and respond much intelligently to analyzable drafting instructions, which promises to marque our merchandise importantly much utile to lawyers successful existent time.

—James Ding, laminitis and CEO of DraftWise.

When should you usage Reinforcement Fine-Tuning?

Reinforcement Fine-Tuning is champion suited for usage cases wherever adaptability, iterative learning, and domain-specific behaviour are essential. You should see RFT if your script involves:

Custom Rule Implementation: RFT thrives successful environments wherever determination logic is highly circumstantial to your enactment and cannot beryllium easy captured done static prompts oregon accepted grooming data. It enables models to larn flexible, evolving rules that bespeak real-world complexity.

Domain-Specific Operational Standards: Ideal for scenarios wherever interior procedures diverge from manufacture norms—and wherever occurrence depends connected adhering to those bespoke standards. RFT tin efficaciously encode procedural variations, specified arsenic extended timelines oregon modified compliance thresholds, into the model’s behavior.

High Decision-Making Complexity: RFT excels successful domains with layered logic and variable-rich determination trees. When outcomes beryllium connected navigating galore subcases oregon dynamically weighing aggregate inputs, RFT helps models generalize crossed complexity and present much consistent, close decisions.

Example: Wealth advisory astatine Contoso Wellness

To showcase the imaginable of RFT, see Contoso Wellness, a fictitious wealthiness advisory firm. Using RFT, the o4-mini exemplary learned to accommodate to unsocial concern rules, specified arsenic identifying optimal lawsuit interactions based connected nuanced patterns similar the ratio of a client’s nett worthy to disposable funds. This enabled Contoso to streamline their onboarding processes and marque much informed decisions faster.

Supervised Fine-Tuning present disposable for GPT-4.1-nano

We’re besides bringing Supervised Fine-Tuning (SFT) to the GPT-4.1-nano model—a tiny but almighty instauration exemplary optimized for high-throughput, cost-sensitive workloads. With SFT, you tin instill your exemplary with company-specific tone, terminology, workflows, and structured outputs—all tailored to your domain. This exemplary volition beryllium disposable for fine-tuning successful the coming days.

Why Fine-tune GPT-4.1-nano?

Precision astatine Scale: Tailor the model’s responses portion maintaining velocity and efficiency.

Enterprise-Grade Output: Ensure alignment with concern processes and tone-of-voice.

Lightweight and Deployable: Perfect for scenarios wherever latency and outgo matter—such arsenic lawsuit work bots, on-device processing, oregon high-volume papers parsing.

Compared to larger models, 4.1-nano delivers faster inference and little compute costs, making it good suited for large-scale workloads like:

Customer enactment automation, wherever models indispensable grip thousands of tickets per hr with accordant code and accuracy.

Internal cognition assistants that travel institution benignant and protocol successful summarizing documentation oregon responding to FAQs.

As a small, fast, but highly susceptible model, GPT-4.1-nano makes a large campaigner for distillation arsenic well. You tin usage models similar GPT-4.1 oregon o4 to make grooming data—or seizure accumulation postulation with stored completions—and thatch 4.1-nano to beryllium conscionable arsenic smart!

Fine-tune gpt-4.1-nano demo successful Azure AI Foundry.

Llama 4 Fine-Tuning present available

We’re besides excited to denote enactment for fine-tuning Meta’s Llama 4 Scout—a cutting edge,17 cardinal progressive parameter exemplary which offers an manufacture starring discourse model of 10M tokens portion fitting connected a azygous H100 GPU for inferencing. It’s a best-in-class model, and much almighty than each erstwhile procreation llama models.

Llama 4 fine-tuning is disposable successful our managed compute offering, allowing you to fine-tune and inference utilizing your ain GPU quota. Available successful some Azure AI Foundry and arsenic Azure Machine Learning components, you person entree to further hyperparameters for deeper customization compared to our serverless experience.

Get started with Azure AI Foundry today

Azure AI Foundry is your instauration for enterprise-grade AI tuning. These fine-tuning enhancements unlock caller frontiers successful exemplary customization, helping you physique intelligent systems that deliberation and respond successful ways that bespeak your concern DNA.

Use Reinforcement Fine-tuning with o4-mini to physique reasoning engines that larn from acquisition and germinate implicit time. Coming soon successful Azure AI Foundry, with determination availability for East US2 and Sweden Central.

Use Supervised Fine-Tuning with 4.1-nano to standard reliable, cost-efficient, and highly customized exemplary behaviors crossed your organization. Available present successful Azure AI Foundry successful North Central US and Sweden Central.

Try Llama 4 scout good tuning to customize a best-in-class unfastened root model. Available present successful Azure AI Foundry exemplary catalog and Azure Machine Learning.

With Azure AI Foundry, fine-tuning isn’t conscionable astir accuracy—it’s astir trust, efficiency, and adaptability astatine each furniture of your stack.

Explore further:

Get started with Azure AI Foundry.
Documentation connected fine-tuning successful Azure AI Foundry.

We’re conscionable getting started. Stay tuned for much exemplary support, precocious tuning techniques, and tools to assistance you physique AI that’s smarter, safer, and uniquely yours.

Read Entire Article