New offerings successful Azure AI Foundry springiness businesses an enterprise-grade level to build, deploy, and standard AI applications and agents.
Microsoft and NVIDIA are deepening our concern to powerfulness the adjacent question of AI concern innovation. For years, our companies person helped substance the AI revolution, bringing the world’s astir precocious supercomputing to the cloud, enabling breakthrough frontier models, and making AI much accessible to organizations everywhere. Today, we’re gathering connected that instauration with caller advancements that present greater performance, capability, and flexibility.
With added enactment for NVIDIA RTX PRO 6000 Blackwell Server Edition connected Azure Local, customers tin deploy AI and ocular computing workloads distributed and borderline environments with the seamless orchestration and absorption you usage successful the cloud. New NVIDIA Nemotron and NVIDIA Cosmos models successful Azure AI Foundry springiness businesses an enterprise-grade level to build, deploy, and standard AI applications and agents. With NVIDIA Run:ai connected Azure, enterprises tin get much from each GPU to streamline operations and accelerate AI. Finally, Microsoft is redefining AI infrastructure with the world’s archetypal deployment of NVIDIA GB300 NVL72.
Today’s announcements people the adjacent section successful our full-stack AI collaboration with NVIDIA, empowering customers to physique the aboriginal faster.
Expanding GPU enactment to Azure Local
Microsoft and NVIDIA proceed to thrust advancements successful artificial intelligence, offering innovative solutions that span the nationalist and backstage cloud, the edge, and sovereign environments.
As highlighted successful the March blog station for NVIDIA GTC, Microsoft volition connection NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs connected Azure. Now, with expanded availability of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs connected Azure Local, organizations can optimize their AI workloads, careless of location, to supply customers with greater flexibility and much options than ever. Azure Local leverages Azure Arc to empower organizations to tally precocious AI workloads on-premises while retaining the absorption simplicity of the unreality or operating in afloat disconnected environments.
NVIDIA RTX PRO 6000 Blackwell GPUs supply the show and flexibility needed to accelerate a wide scope of usage cases, from agentic AI, physical AI, and technological computing to rendering, 3D graphics, integer twins, simulation, and ocular computing. This expanded GPU enactment unlocks a scope of borderline usage cases that fulfill the stringent requirements of captious infrastructure for our healthcare, retail, manufacturing, government, defense, and quality customers. This whitethorn see real-time video analytics for nationalist safety, predictive attraction successful concern settings, accelerated aesculapian diagnostics, and secure, low-latency inferencing for essential services specified arsenic vigor accumulation and captious infrastructure. The NVIDIA RTX PRO 6000 Blackwell enables improved virtual desktop enactment by leveraging NVIDIA vGPU exertion and Multi-Instance GPU (MIG) capabilities. This tin not lone accommodate a higher idiosyncratic density, but besides powerfulness AI-enhanced graphics and ocular compute capabilities, offering an businesslike solution for demanding virtual environments.
Earlier this year, Microsoft announced a multitude of AI capabilities astatine the edge, each enriched with NVIDIA accelerated computing:
- Edge Retrieval Augmented Generation (RAG): Empower sovereign AI deployments with fast, secure, and scalable inferencing connected section data—supporting mission-critical usage cases crossed government, healthcare, and concern automation.
- Azure AI Video Indexer enabled by Azure Arc: Enables real-time and recorded video analytics successful disconnected environments—ideal for nationalist information and captious infrastructure monitoring oregon post-event analysis.
With Azure Local, customers tin conscionable strict regulatory, information residency, and privateness requirements portion harnessing the latest AI innovations powered by NVIDIA.
Whether you request ultra-low latency for concern continuity, robust section inferencing, oregon compliance with manufacture regulations, we’re dedicated to delivering cutting-edge AI show wherever your information resides. Customers present entree the breakthrough show of the NVIDIA RTX PRO 6000 Blackwell GPUs successful caller Azure Local solutions—including Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.
To find retired much astir upcoming availability and motion up for aboriginal ordering, visit:
Powering the aboriginal of AI with caller models connected Azure AI Foundry
At Microsoft, we’re committed to bringing the astir precocious AI capabilities to our customers, wherever they request them. Through our concern with NVIDIA, Azure AI Foundry present brings world-class multimodal reasoning models straight to enterprises, deployable anyplace arsenic secure, scalable NVIDIA NIM™ microservices. The portfolio spans a scope of antithetic usage cases:
NVIDIA Nemotron Family: High accuracy unfastened models and datasets for agentic AI
- Llama Nemotron Nano VL 8B is disposable present and is tailored for multimodal vision-language tasks, papers quality and understanding, and mobile and borderline AI agents.
- NVIDIA Nemotron Nano 9B is disposable present and supports endeavor agents, technological reasoning, precocious math, and coding for bundle engineering and instrumentality calling.
- NVIDIA Llama 3.3 Nemotron Super 49B 1.5 is coming soon and is designed for endeavor agents, technological reasoning, precocious math, and coding for bundle engineering and instrumentality calling.
NVIDIA Cosmos Family: Open satellite instauration models for carnal AI
- Cosmos Reason-1 7B is disposable present and supports robotics readying and determination making, grooming information curation and annotation for autonomous vehicles, and video analytics AI agents extracting insights and performing root-cause investigation from video data.
- NVIDIA Cosmos Predict 2.5 is coming soon and is simply a generalist exemplary for satellite authorities procreation and prediction.
- NVIDIA Cosmos Transfer 2.5 is coming soon and is designed for structural conditioning and carnal AI.
Microsoft TRELLIS by Microsoft Research: High-quality 3D plus generation
- Microsoft TRELLIS by Microsoft Research is disposable present and enables integer twins by generating close 3D assets from elemental prompts, immersive retail experiences with photorealistic merchandise models for AR and virtual try-ons, and crippled and simulation improvement by turning originative ideas into production-ready 3D content.
Together, these unfastened models bespeak the extent of the Azure and NVIDIA partnership: combining Microsoft’s adaptive unreality with NVIDIA’s enactment successful accelerated computing to powerfulness the adjacent procreation of agentic AI for each industry. Learn much astir the models here.
Maximizing GPU utilization for endeavor AI with NVIDIA Run:ai connected Azure
As an AI workload and GPU orchestration platform, NVIDIA Run:ai helps organizations marque the astir of their compute investments, accelerating AI improvement cycles and driving faster time-to-market for caller insights and capabilities. By bringing NVIDIA Run:ai to Azure, we’re giving enterprises the quality to dynamically allocate, share, and negociate GPU resources crossed teams and workloads, helping them get much from each GPU.
NVIDIA Run:ai connected Azure integrates seamlessly with halfway Azure services, including Azure NC and ND bid instances, Azure Kubernetes Service (AKS), and Azure Identity Management, and offers compatibility with Azure Machine Learning and Azure AI Foundry for unified, enterprise-ready AI orchestration. We’re bringing hybrid standard to beingness to assistance customers alteration static infrastructure into a flexible, shared assets for AI innovation.
With smarter orchestration and cloud-ready GPU pooling, teams tin thrust faster innovation, trim costs, and unleash the powerfulness of AI crossed their organizations with confidence. NVIDIA Run:ai connected Azure enhances AKS with GPU-aware scheduling, helping teams allocate, share, and prioritize GPU resources much efficiently. Operations are streamlined with one-click occupation submission, automated queueing, and built successful governance. This ensures teams walk little clip managing infrastructure and much clip focused connected gathering what’s next.
This interaction spans industries, supporting the infrastructure and orchestration down transformative AI workloads astatine each signifier of endeavor growth:
- Healthcare organizations tin usage NVIDIA Run:ai connected Azure to beforehand aesculapian imaging investigation and cause find workloads crossed hybrid environments.
- Financial services organizations tin orchestrate and standard GPU clusters for analyzable hazard simulations and fraud detection models.
- Manufacturers tin accelerate machine imaginativeness grooming models for improved prime power and predictive attraction successful their factories.
- Retail companies tin powerfulness real-time proposal systems for much personalized experiences done businesslike GPU allocation and scaling, yet amended serving their customers.
Powered by Microsoft Azure and NVIDIA, Run:ai is purpose-built for scale, helping enterprises determination from isolated AI experimentation to production-grade innovation.
Reimagining AI astatine scale: First to deploy NVIDIA GB300 NVL72 supercomputing cluster
Microsoft is redefining AI infrastructure with the caller NDv6 GB300 VM series, delivering the first at-scale accumulation cluster of NVIDIA GB300 NVL72 systems, featuring implicit 4600 NVIDIA Blackwell Ultra GPUs connected via NVIDIA Quantum-X800 InfiniBand networking. Each NVIDIA GB300 NVL72 rack integrates 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace™ CPUs, delivering implicit 130 TB/s of NVLink bandwidth and up to 136 kW of compute power successful a azygous cabinet. Designed for the astir demanding workloads—reasoning models, agentic systems, and multimodal AI—GB300 NVL72 combines ultra-dense compute, nonstop liquid cooling, and astute rack-scale absorption to present breakthrough ratio and show wrong a modular datacenter footprint.
Azure’s co-engineered infrastructure enhances GB300 NVL72 with technologies similar Azure Boost for accelerated I/O and integrated hardware information modules (HSM) for enterprise-grade protection. Each rack arrives pre-integrated and self-managed, enabling rapid, repeatable deployment crossed Azure’s planetary fleet. As the archetypal unreality supplier to deploy NVIDIA GB300 NVL72 at scale, Microsoft is setting a caller modular for AI supercomputing—empowering organizations to bid and deploy frontier models faster, much efficiently, and much securely than ever before. Together, Azure and NVIDIA are powering the aboriginal of AI.
Learn much about Microsoft’s systems attack successful delivering GB300 NVL72 connected Azure.
Unleashing the show of ND GB200-v6 VMs with NVIDIA Dynamo
Our collaboration with NVIDIA focuses on optimizing every furniture of the computing stack to assistance customers maximize the worth of their existing AI infrastructure investments.
To present high-performance inference for compute-intensive reasoning models astatine scale, we’re bringing unneurotic a solution that combines the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72 and Azure Kubernetes Service(AKS). We’ve demonstrated the show this combined solution delivers astatine standard with the gpt-oss 120b exemplary processing 1.2 cardinal tokens per 2nd deployed successful a production-ready, managed AKS clump and person published a deployment usher for developers to get started today.
Dynamo is an open-source, distributed inference model designed for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing and KV caching, Dynamo importantly boosts show for reasoning models connected Blackwell, unlocking up to 15x much throughput compared to the anterior Hopper generation, opening caller gross opportunities for AI work providers.
These efforts alteration AKS accumulation customers to instrumentality afloat vantage of NVIDIA Dynamo’s inference optimizations erstwhile deploying frontier reasoning models astatine scale. We’re dedicated to bringing the latest open-source bundle innovations to our customers, helping them afloat recognize the imaginable of the NVIDIA Blackwell level connected Azure.
Learn much astir Dynamo on AKS.
Get much AI resources
- Join america successful San Francisco astatine Microsoft Ignite successful November to perceive astir the latest successful endeavor solutions and innovation.
- Explore Azure AI Foundry and Azure Local.