No Nvidia, No AMD, No Intel, No ARM: Meta plans…

No Nvidia, No AMD, No Intel, No ARM: Meta plans inference-led RISC-y future without friends as 1700w superchip emerges with 30 PFLOPs performance and half Terabyte (yes 512GB) HBM

Meta’s 1700W superchip delivers 30 PFLOPs and 512GB of HBM memory
MTIA 450 and 500 prioritize inference over pre-training workloads
Future MTIA generations will support GenAI inference and ranking workloads

Meta is advancing its AI infrastructure with a portfolio of custom MTIA chips designed specifically for inference workloads across its apps.

The company is developing a 1700W superchip capable of 30 PFLOPs and 512GB of HBM, integrated within the same MTIA infrastructure to handle inference tasks at scale.

Interestingly, it is achieving this feat without any of its friends — no Nvidia, AMD, Intel, or ARM.

Meta scales inference with extensive MTIA deployment

According to Meta, hundreds of thousands of MTIA chips are already deployed in production, supporting ranking, recommendations, and ad-serving workloads.

These chips are part of a full-stack system optimized for Meta’s specific requirements, achieving higher compute efficiency than general-purpose hardware for its intended workloads.

Unlike other hyperscalers such as Google, AWS, Microsoft, and Apple, Meta is pursuing a fully custom silicon strategy.

This design prioritizes efficiency over general-purpose use, allowing inference to run more cost-effectively than on mainstream GPUs or CPUs.

It maintains compatibility with industry-standard software such as PyTorch, vLLM, and Triton.

Meta’s MTIA roadmap anticipates four new generations of chips over the next two years, including MTIA 300, currently in production for ranking and recommendations.

Future generations — MTIA 400, 450, and 500 — will expand support for GenAI inference workloads, with designs capable of fitting into existing rack infrastructure.

Meta emphasizes rapid, iterative development, releasing new chips roughly every six months through modular and reusable designs.

The modular design allows new chips to drop into existing rack systems, reducing deployment friction and accelerating time to production.

The approach allows the company to adopt emerging AI techniques and hardware improvements faster than competitors, who typically cycle one to two years per generation.

Unlike most mainstream AI chips that prioritize large-scale GenAI pre-training and later adapt for inference, Meta’s MTIA 450 and 500 focus first on inference workloads.

The chips can also support other tasks, including ranking and recommendations training or GenAI training, but their design keeps them tuned to anticipated growth in inference demand.

Meta’s system-level design aligns with Open Compute Project standards, enabling frictionless deployment in data centers while maintaining high compute efficiency.

The company acknowledges that no single chip can handle the full spectrum of its AI workloads.

This is why it is deploying multiple MTIA generations alongside complementary silicon from other vendors.

The strategy aims to balance flexibility and performance while accelerating innovation toward personal superintelligence.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here