Introducing Fireworks AI on Microsoft Foundry: Bringing high performance, low latency open model inference to Azure

1 month ago 23

We’re announcing the nationalist preview of Fireworks AI connected Microsoft Foundry, bringing high‑performance unfastened exemplary inference into Azure. This integration reflects Microsoft Foundry’s broader direction: providing a azygous spot wherever developers tin not lone tally unfastened models efficiently but besides customize and operationalize them arsenic portion of a implicit enterprise‑ready AI lifecycle.

Across industries, organizations are progressively standardizing connected unfastened models to summation greater power implicit performance, cost, customization, and the information and compliance required for endeavor deployment. Open models springiness teams the flexibility to take the close architecture for each workload and debar lock‑in to a azygous exemplary supplier arsenic their needs evolve.

As adoption grows, however, show unsocial is nary longer enough. Teams request a accordant mode to measure models quickly, run them safely successful production, and amended them implicit clip without rebuilding infrastructure oregon fragmenting their tooling. Too often, organizations are forced to assemble bespoke serving stacks, slowing innovation and making it harder to standard and compound progress.

Microsoft Foundry is designed to code this challenge. It serves arsenic a unified strategy of grounds and endeavor power level for AI, bringing unneurotic models, agents, evaluation, deployment, and governance into a azygous experience. With Microsoft Foundry, teams tin determination from experimentation to accumulation with confidence, utilizing the models and frameworks that champion acceptable their requirements, portion relying connected a accordant operational foundation.

Today, we’re announcing the nationalist preview of Fireworks AI connected Microsoft Foundry, bringing high‑performance unfastened exemplary inference into Azure. This integration reflects Microsoft Foundry’s broader direction: providing a azygous spot wherever developers tin not lone tally unfastened models efficiently but besides customize and operationalize them arsenic portion of a implicit enterprise‑ready AI lifecycle.

Fireworks AI models connected Microsoft Foundry: A azygous spot for unfastened models

Fireworks AI delivers industry-leading inference for unfastened models, and Microsoft Foundry is what makes that show usable astatine endeavor scale. Accessing Fireworks AI done Microsoft Foundry gives teams a single, trusted power level to evaluate, deploy, customize, and run unfastened models alongside the remainder of their AI stack.

As unfastened models mature, customization progressively extends beyond training. Teams request accordant ways to configure, deploy, optimize, govern, and iterate connected models successful accumulation without fragmenting tools oregon infrastructure. Microsoft Foundry provides the situation wherever these customization and operational workflows are standardized, portion Fireworks AI supplies the show and ratio needed to tally unfastened models astatine scale. This means teams tin determination from experimentation to accumulation utilizing unfastened models without stitching unneurotic abstracted tools, contracts, and deployment paths.

Together, Fireworks AI and Microsoft Foundry alteration a much implicit and sustainable attack to moving with unfastened models combining fast, businesslike inference with a level designed to enactment endeavor unfastened exemplary operations implicit time.

With Fireworks AI connected Foundry, developers tin get entree to best-in-class inferencing for unfastened models, including optimized deployments for customized value models. Fireworks AI is simply a marketplace person for precocious show inference for unfastened models. Its motor already runs astatine net standard processing implicit 13T tokens daily, sustaining astir 180 1000 requests per second, and generating implicit 1,000 tokens per 2nd connected ample models, substantiated by starring benchmark show on Artificial Analysis. This show is present disposable connected Foundry.

Developers tin log into Foundry and entree these unfastened models with Fireworks AI today:

DeepSeek V3.2
OpenAI gpt-oss-120b
Kimi K2.5
MiniMax M2.5 (new)

This brings a caller unfastened exemplary (MiniMax M2.5) to Foundry with serverless enactment and offers optimized inference for already fashionable unfastened models.

With Fireworks AI successful Microsoft Foundry, developers can:

Evaluate models faster with day‑zero entree and support: Start gathering instantly with entree to state-of-the-art unfastened models from Fireworks AI done a azygous Azure endpoint via Foundry.
Optimize inference: Requests to unfastened models are served by Fireworks’ high‑throughput inference stack for accelerated show with Azure‑grade governance.
Run the models you already trust: With bring-your-own-weights (BYOW), you tin upload and registry quantized oregon fine‑tuned weights trained elsewhere without changing the serving stack.

A demo of customized exemplary creation.

Choose the close pricing exemplary for your workload: Use serverless, pay-per‑token inference to experimentation securely and rapidly with Data Zone Standard oregon take provisioned throughput units (PTUs) for predictable, steady-state show with basal oregon customized models. Whether you’re optimizing for agility oregon efficiency, you get flexibility without managing infrastructure.
Operate with endeavor spot and scale: We are committed to enabling customers to physique production-ready AI applications quickly, portion maintaining the highest levels of information and security. Foundry provides an end-to-end workspace for cause development, evaluation, and deployment, including unified governance, observability, and agent-ready tooling.

The aboriginal of Fireworks and AI usage cases

Microsoft Foundry is evolving to enactment the afloat lifecycle of unfastened models—from aboriginal valuation done accumulation cognition and ongoing optimization. As teams standard their usage of unfastened models, having a consistent, enterprise‑ready instauration becomes progressively important.

By integrating Fireworks AI into Microsoft Foundry, developers summation entree to high‑performance inference contiguous portion gathering connected a level designed to enactment deeper customization and endeavor operations implicit time. This attack gives teams the assurance to follow unfastened models not conscionable for what they tin bash now, but for however they tin grow, adapt, and run reliably arsenic their AI ambitions expand. We’re looking guardant to seeing however developers and enterprises usage Fireworks AI connected Microsoft Foundry to powerfulness the adjacent procreation of intelligent applications.

To get started:

Go to Microsoft Foundry models and prime Fireworks AI unfastened models successful the exemplary catalog collection.
Select the unfastened exemplary hosted by Fireworks.
View the exemplary card.
Select your deployment option—serverless oregon PTU—and deploy.

Learn much astir Fireworks connected Microsoft Foundry

Learn much astir Fireworks connected Microsoft Foundry.
Learn however to upload customized value models for inferencing with Fireworks connected Foundry.
Join Fireworks connected Model Mondays connected March 23 unrecorded connected YouTube oregon connected demand.
Explore The Fast Track to AI Apps and Agents for a roadmap to build, deploy, and standard AI-native solutions with Azure.

Read Entire Article