Microsoft announces Maia 200 AI accelerator built for inference

Microsoft Blogs:

Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM, plus data movement engines that keep massive models fed, fast and highly utilized. This makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh generation TPU. Maia 200 is also the most efficient inference system Microsoft has ever deployed, with 30% better performance per dollar than the latest generation hardware in our fleet today.

Maia 200 is part of our heterogenous AI infrastructure and will serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next-generation in-house models. For synthetic data pipeline use cases, Maia 200’s unique design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered, feeding downstream training with fresher, more targeted signals.

Maia 200 is deployed in our US Central datacenter region near Des Moines, Iowa, with the US West 3 datacenter region near Phoenix, Arizona, coming next and future regions to follow. Maia 200 integrates seamlessly with Azure, and we are previewing the Maia SDK with a complete set of tools to build and optimize models for Maia 200. It includes a full set of capabilities, including PyTorch integration, a Triton compiler and optimized kernel library, and access to Maia’s low-level programming language. This gives developers fine-grained control when needed while enabling easy model porting across heterogeneous hardware accelerators.

Engineered for AI inference

Fabricated on TSMC’s cutting-edge 3-nanometer process, each Maia 200 chip contains over 140 billion transistors and is tailored for large-scale AI workloads while also delivering efficient performance per dollar. On both fronts, Maia 200 is built to excel. It is designed for the latest models using low-precision compute, with each Maia 200 chip delivering over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC TDP envelope. In practical terms, Maia 200 can effortlessly run today’s largest models, with plenty of headroom for even bigger models in the future.

A close-up of the Maia 200 AI accelerator chip.

Crucially, FLOPS aren’t the only ingredient for faster AI. Feeding data is equally important. Maia 200 attacks this bottleneck with a redesigned memory subsystem. The Maia 200 memory subsystem is centered on narrow-precision datatypes, a specialized DMA engine, on-die SRAM and a specialized NoC fabric for high‑bandwidth data movement, increasing token throughput.

A table with the title “Industry-leading capability” shows peak specifications for Azure Maia 200, AWS Trainium 3 and Google TPU v7.

Optimized AI systems

At the systems level, Maia 200 introduces a novel, two-tier scale-up network design built on standard Ethernet. A custom transport layer and tightly integrated NIC unlocks performance, strong reliability and significant cost advantages without relying on proprietary fabrics.

Each accelerator exposes:

2.8 TB/s of bidirectional, dedicated scaleup bandwidth
Predictable, high-performance collective operations across clusters of up to 6,144 accelerators

This architecture delivers scalable performance for dense inference clusters while reducing power usage and overall TCO across Azure’s global fleet.

Within each tray, four Maia accelerators are fully connected with direct, non‑switched links, keeping high‑bandwidth communication local for optimal inference efficiency. The same communication protocols are used for intra-rack and inter-rack networking using the Maia AI transport protocol, enabling seamless scaling across nodes, racks and clusters of accelerators with minimal network hops. This unified fabric simplifies programming, improves workload flexibility and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale.

A top-down view of the Maia 200 server blade.

A cloud-native development approach

A core principle of Microsoft’s silicon development programs is to validate as much of the end-to-end system as possible ahead of final silicon availability.

A sophisticated pre-silicon environment guided the Maia 200 architecture from its earliest stages, modeling the computation and communication patterns of LLMs with high fidelity. This early co-development environment enabled us to optimize silicon, networking and system software as a unified whole, long before first silicon.

We also designed Maia 200 for fast, seamless availability in the datacenter from the beginning, building out early validation of some of the most complex system elements, including the backend network and our second-generation, closed loop, liquid cooling Heat Exchanger Unit. Native integration with the Azure control plane delivers security, telemetry, diagnostics and management capabilities at both the chip and rack levels, maximizing reliability and uptime for production-critical AI workloads.

As a result of these investments, AI models were running on Maia 200 silicon within days of first packaged part arrival. Time from first silicon to first datacenter rack deployment was reduced to less than half that of comparable AI infrastructure programs. And this end-to-end approach, from chip to software to datacenter, translates directly into higher utilization, faster time to production and sustained improvements in performance per dollar and per watt at cloud scale.

A view of the Maia 200 rack and the HXU cooling unit.

Sign up for the Maia SDK preview

The era of large-scale AI is just beginning, and infrastructure will define what’s possible. Our Maia AI accelerator program is designed to be multi-generational. As we deploy Maia 200 across our global infrastructure, we are already designing for future generations and expect each generation will continually set new benchmarks for what’s possible and deliver ever better performance and efficiency for the most important AI workloads.

Today, we’re inviting developers, AI startups and academics to begin exploring early model and workload optimization with the new Maia 200 software development kit (SDK). The SDK includes a Triton Compiler, support for PyTorch, low-level programming in NPL and a Maia simulator and cost calculator to optimize for efficiencies earlier in the code lifecycle. Sign up for the preview here.

Get more photos, video and resources on our Maia 200 site and read more details.

Source:

Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system...

blogs.microsoft.com

Click to expand...

P

PhilB

Well-known member

Power User

VIP

Jan 26, 2026

#2

2 QUESTIONS.

What kind of volume does Microsoft hope to get, to justify the large development cost?

What does this mean for NVidia (and AMD)?

My Computer

At a glance

Windows 11 2H25AMD 9900X64 GBAMD 9070 XT

OS: Windows 11 2H25

Computer type: PC/Desktop

Manufacturer/Model: DIY

CPU: AMD 9900X

Motherboard: MSI X870E Carbon

Memory: 64 GB

Graphics Card(s): AMD 9070 XT

Sound Card: built-in

Monitor(s) Displays: Dell 24"

Hard Drives: Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD

PSU: Seasonic 850

Case: Fractal Design North XL (which I likw)

Cooling: Corsair AIO for CPU, fans for case

Keyboard: Das Keyboard 4

Mouse: Corsair M65 (white)

Internet Speed: 1 TB download

Browser: Firefox

Antivirus: Bitdefender

Other Info: Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)

PurSpyk!!

Well-known member

Power User

VIP

Jan 26, 2026

#3

All seems a bit pointless considering how dominant NVidia is in the AI space

My Computers

System One System Two

At a glance
Windows 11 25H2Intel i9 14900KF64GB Corsair Vengeance RGBMSI 4090 Suprim X

OS
Windows 11 25H2
Computer type
PC/Desktop
Manufacturer/Model
Custom Built
CPU
Intel i9 14900KF
Motherboard
ASUS Z790 ProArt Creator WiFi
Memory
64GB Corsair Vengeance RGB
Graphics Card(s)
MSI 4090 Suprim X
Sound Card
Onboard
Monitor(s) Displays
1 x Asus 24". 1 x Asus 32"
Screen Resolution
1920 x 1080 & 2560 x 1440
Hard Drives
Multiple
PSU
Corsair 1200HX
Case
Corsair 7000D RGB
Cooling
Corsair H150I Capellix XT
Keyboard
Corsair K70 RGB PRO
Mouse
Corsair M55 RGB Pro
Internet Speed
1000Mbps
Browser
Edge
Antivirus
Windows Default
At a glance
Windows 11 25H2Intel i7 6800K32GB DDR4 (Corsair)ASUS GTX 1080ti

Operating System
Windows 11 25H2
Computer type
PC/Desktop
Manufacturer/Model
Custom Built
CPU
Intel i7 6800K
Motherboard
ASUS Z99 Deluxe
Memory
32GB DDR4 (Corsair)
Graphics card(s)
ASUS GTX 1080ti
Sound Card
Onboard
Monitor(s) Displays
1x Viewsonic 24" 1x LG 19"
Screen Resolution
1920 x 1080 & 1600 x 900
Hard Drives
3 x SATA SSD
PSU
650W Gigabyte Bronze
Case
Coolermaster HAF-X
Cooling
Noctua NH-15 Chroma black
Keyboard
Generic RGB
Mouse
Microsoft Basic
Internet Speed
1000Mbps
Browser
Edge
Antivirus
Windows Default

P

PhilB

Well-known member

Power User

VIP

Jan 26, 2026

#4

PurSpyk!! said:
All seems a bit pointless considering how dominant NVidia is in the AI space

Yes, for today and probably into 2027. Remember that Yahoo was once dominant in search. I'm not saying that NVidia will be in trouble by 2028, but anything could happen. Maybe Google releases the NVidia-killer chip. Or maybe some Chinese company.

My Computer

At a glance

Windows 11 2H25AMD 9900X64 GBAMD 9070 XT

OS: Windows 11 2H25

Computer type: PC/Desktop

Manufacturer/Model: DIY

CPU: AMD 9900X

Motherboard: MSI X870E Carbon

Memory: 64 GB

Graphics Card(s): AMD 9070 XT

Sound Card: built-in

Monitor(s) Displays: Dell 24"

Hard Drives: Sabrent 1 TB NVMe, 4 x SSD (need to check models), 4 x 3.5" HDD, 8-16 TB, all WD

PSU: Seasonic 850

Case: Fractal Design North XL (which I likw)

Cooling: Corsair AIO for CPU, fans for case

Keyboard: Das Keyboard 4

Mouse: Corsair M65 (white)

Internet Speed: 1 TB download

Browser: Firefox

Antivirus: Bitdefender

Other Info: Also have Lenovo T14S laptop (me) and Lenovo Slim 71 (wife)

You must log in or register to reply here.

Similar threads

Article

COMPUTEX2024: Intel Accelerates AI Everywhere, Redefines Power, Performance and Affordability

Replies: 1

Views: 2K

Jun 3, 2024

Brink

Search

Microsoft announces Maia 200 AI accelerator built for inference

Engineered for AI inference

Optimized AI systems

A cloud-native development approach

Sign up for the Maia SDK preview

Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

PhilB

Well-known member

My Computer

At a glance

PurSpyk!!

Well-known member

My Computers

At a glance

At a glance

PhilB

Well-known member

My Computer

At a glance

Similar threads

Latest Support Threads

Latest Tutorials

Microsoft announces Maia 200 AI accelerator built for inference

Engineered for AI inference​

Optimized AI systems​

A cloud-native development approach​

Sign up for the Maia SDK preview​

Well-known member

My Computer My Computer

At a glance

Well-known member

My Computers My Computers

At a glance

At a glance

Well-known member

My Computer My Computer

At a glance

Similar threads

Engineered for AI inference

Optimized AI systems

A cloud-native development approach

Sign up for the Maia SDK preview

My Computer

My Computers

My Computer