
“In practical terms, Maia 200 can effortlessly run today’s largest models, with plenty of headroom for even bigger models in the future,” Microsoft says.
Maia feeds data to models differently, too, through what Microsoft refers to as a redesigned memory subsystem which features a specialized direct memory access (DMA) engine and on-die static random-access memory (SRAM), as well as specialized network-on-chip (NoC) fabric. This all allows for high-bandwidth data movement while increasing token throughput.
Designed for heterogeneity, multi-modal AI
Microsoft says it specifically designed Maia 200 with modern LLMs in mind; forward-thinking customers, it says, are looking not just for text prompts, but access to multimodal capabilities (sound, images, video) that support deeper reasoning capabilities, multi-step agents, and, eventually, autonomous AI tasks.
As part of its heterogeneous AI infrastructure, Microsoft says that Maia 200 will serve multiple models, including OpenAI’s latest GPT-5.2 family. It integrates seamlessly with Microsoft Azure, and Microsoft Foundry and Microsoft 365 Copilot will also benefit from the chip. The company’s superintelligence team also plans to use Maia 200 for reinforcement learning (RL) and synthetic data generation to improve in-house models.
From a specification perspective, Maia 200 exceeds Amazon’s Trainium and Inferentia and Google’s TPU v4i and v5i, noted Scott Bickley, advisory fellow at Info-Tech Research Group. It is produced on a 3nm node, versus the 7nm or 5nm nodes for the Amazon and Google chips, and it also displays superior performance in compute, interconnect, and memory capabilities, he said.
However, he noted, “while these numbers are impressive, customers should verify actual performance within the Azure stack prior to scaling out workloads away from Nvidia, as an example.” They should also ensure that part of the 30% saving being realized by Microsoft is being passed through to the customer via their Azure subscription charges, he added.



















