×
 

Microsoft Maia 200 Chip to Power OpenAI GPT-5.2 and Cut AI Costs

Microsoft unveils Maia 200 AI chip for faster, cheaper inference, powering OpenAI’s GPT-5.2 in Azure.

Microsoft unveiled the Maia 200 AI accelerator on Monday, a custom chipset optimized for inference workloads in large language models (LLMs). As the successor to the 2023 Maia 100, it's rolling out across US Azure data centers, promising 30% better performance per dollar than current hardware. Built on TSMC's 3nm process with over 140 billion transistors, it targets real-time AI applications like chatbots and generative tools.

Inference-Optimized Architecture

Maia 200 excels in low-precision formats like FP4 and FP8, hitting over 10 petaFLOPS in FP4 mode and 5 petaFLOPS in FP8 for faster, energy-efficient responses. It packs 216GB HBM3e memory at 7TBps bandwidth and 272MB on-chip SRAM to eliminate data bottlenecks during model serving. Custom memory and communication designs keep LLMs "fed" with inputs, ideal for scaling from single queries to enterprise floods.

Also Read: Next-Gen AirTag Unveiled: 50% Farther Precision Finding and Louder Speaker

Massive Scalability for Data Centers

Up to 6,144 Maia 200 chips interconnect via Ethernet with 2.8TBps bi-directional bandwidth per unit, enabling rack-scale clusters without exotic networking. This supports hyperscale inference for services like Copilot or GPT-5.2, which Microsoft confirmed will leverage it for lower latency and costs. Power efficiency shines in dense deployments, addressing AI's exploding energy demands amid global data center constraints.

Developer Tools and OpenAI Tie-In

A preview Maia 200 SDK offers Triton compiler, PyTorch integration, optimized kernels, and low-level APIs for model tuning—inviting startups to optimize workloads. Microsoft positions it as OpenAI-exclusive for GPT-5.2, accelerating inference beyond training phases dominated by Nvidia. Early benchmarks show it handling trillion-parameter models with sub-second responses, a game-changer for real-world AI.

Cost and Performance Edge

By focusing on inference—where most AI operational costs accrue—Maia 200 cuts expenses through dense packing and precision tricks, outperforming predecessors in tokens-per-dollar metrics. Deployment starts US-only but expands globally, challenging hyperscalers' reliance on third-party silicon. This in-house push echoes Amazon's Trainium and Google's TPUs, signaling Big Tech's silicon sovereignty era.

Also Read: Jacob & Co Unveils ₹14 Crore Vantara-Themed Watch Featuring Anant Ambani Figurine

 
 
 
Gallery Gallery Videos Videos Share on WhatsApp Share