Microsoft delivers the archetypal at-scale accumulation clump with much than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected done the next-generation NVIDIA InfiniBand network.
Microsoft delivers the first at-scale accumulation clump with much than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected done the next-generation NVIDIA InfiniBand network. This clump is the archetypal of many, as we scale to hundreds of thousands of Blackwell Ultra GPUs deployed crossed Microsoft’s AI datacenters globally, reflecting our continued committedness to redefining AI infrastructure and collaboration with NVIDIA. The monolithic standard clusters with Blackwell Ultra GPUs volition enable model grooming successful weeks alternatively of months, delivering precocious throughput for inference workloads. We are besides unlocking bigger, much almighty models, and volition beryllium the archetypal to enactment grooming models with hundreds of trillions of parameters.
This was made imaginable done collaboration crossed hardware, systems, proviso chain, facilities, and aggregate different disciplines, arsenic good arsenic with NVIDIA.
Microsoft Azure’s motorboat of the NVIDIA GB300 NVL72 supercluster is an breathtaking measurement successful the advancement of frontier AI. This co-engineered strategy delivers the world’s archetypal at-scale GB300 accumulation cluster, providing the supercomputing motor needed for OpenAI to service multitrillion-parameter models. This sets the definitive caller modular for accelerated computing.
Ian Buck, Vice President of Hyperscale and High-performance Computing astatine NVIDIAFrom NVIDIA GB200 to GB300: A caller modular successful AI performance
Earlier this year, Azure introduced ND GB200 v6 virtual machines (VMs), accelerated by NVIDIA’s Blackwell architecture. These rapidly became the backbone of immoderate of the astir demanding AI workloads successful the industry, including for organizations similar OpenAI and Microsoft who already usage monolithic clusters of GB200 NVL2 connected Azure to bid and deploy frontier models.
Now, with ND GB300 v6 VMs, Azure is raising the barroom again. These VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI. Built connected a rack-scale system, each rack has 18 VMs with a full of 72 GPUs:
- 72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs).
- 800 gigabits per 2nd (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72).
- 130 terabytes (TB) per 2nd of NVIDIA NVLink bandwidth wrong rack.
- 37TB of accelerated memory.
- Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance.

Building for AI supercomputing astatine scale
Building infrastructure for frontier AI requires america to reimagine each furniture of the stack—computing, memory, networking, datacenters, cooling, and power—as a unified system. The ND GB300 v6 VMs are a wide practice of this transformation, from years of collaboration crossed silicon, systems, and software.
At the rack level, NVLink and NVSwitch trim representation and bandwidth constraints, enabling up to 130TB per 2nd of intra-rack data-transfer connecting 37TB full of accelerated memory. Each rack becomes a tightly coupled unit, delivering higher inference throughput astatine reduced latencies connected larger models and longer discourse windows, empowering agentic and multimodal AI systems to beryllium much responsive and scalable than ever.
To standard beyond the rack, Azure deploys a afloat fat-tree, non-blocking architecture utilizing NVIDIA Quantum-X800 Gbp/s InfiniBand, the fastest networking cloth disposable today. This ensures that customers tin standard up grooming of ultra-large models efficiently to tens of thousands of GPUs with minimal connection overhead, frankincense delivering amended end-to-end grooming throughput. Reduced synchronization overhead besides translates to maximum utilization of GPUs, which helps researchers iterate faster and astatine little costs contempt the compute-hungry quality of AI grooming workloads. Azure’s co-engineered stack, including customized protocols, corporate libraries, and in-network computing, ensures the web is highly reliable and afloat utilized by the applications. Features similar NVIDIA SHARP accelerate corporate operations and treble effectual bandwidth by performing mathematics successful the switch, making large-scale grooming and inference much businesslike and reliable.
Azure’s precocious cooling systems usage standalone vigor exchanger units and installation cooling to minimize h2o usage portion maintaining thermal stableness for dense, high-performance clusters similar GB300 NVL72. We besides proceed to make and deploy caller powerfulness organisation models susceptible of supporting the precocious vigor density and dynamic load balancing required by the ND GB300 v6 VM people of GPU clusters.
Further, our reengineered bundle stacks for storage, orchestration, and scheduling are optimized to afloat usage computing, networking, storage, and datacenter infrastructure astatine supercomputing scale, delivering unprecedented levels of show astatine precocious ratio to our customers.

Looking ahead
Microsoft has invested successful AI infrastructure for years, to let for accelerated enablement and modulation into the newest technology. It is besides wherefore Azure is uniquely positioned to present GB300 NVL72 infrastructure astatine accumulation standard astatine a accelerated pace, to conscionable the demands of frontier AI today.
As Azure continues to ramp up GB300 worldwide deployments, customers tin expect to bid and deploy caller models successful a fraction of the clip compared to erstwhile generations. The ND GB300 v6 VMs v6 are poised to go the caller modular for AI infrastructure, and Azure is arrogant to pb the way, supporting customers to beforehand frontier AI development.
Stay tuned for much updates and show benchmarks arsenic Azure expands accumulation deployment of NVIDIA GB300 NVL72 globally.