Etched, a startup that builds transformer-focused chips, just announced Sohu, an application-specific integrated circuit (ASIC) that claims to beat Nvidia’s H100 in terms of AI LLM inference. A single 8xSohu server is said to equal the performance of 160 H100 GPUs, meaning data processing centers can save both on initial and operational costs if the Sohu meets expectations.
According to the company, current AI accelerators, whether CPUs or GPUs, are designed to work with different AI architectures. These differing frameworks and designs mean hardware must be able to support various models, like convolution neural networks, long short-term memory networks, state space models, and so on. Because these models are tuned to different architectures, most current AI chips allocate a large portion of their computing power to programmability.
Most large language models (LLMs) use matrix multiplication for the majority of their compute tasks and Etched estimated that Nvidia’s H100 GPUs only use 3.3% percent of their transistors for this key task. This means that the remaining 96.7% silicon is used for other tasks, which are still essential for general-purpose AI chips.
However, the transformer AI architecture has become very popular as of late. For example, ChatGPT, arguably the most popular LLM today, is based on a transformer model. In fact, it’s in the name — Chat generative pre-trained transformer (GPT). Other competing models like Sora, Gemini, Stable Diffusion, and DALL-E are all also based on transformer models.
Etched made a huge bet on transformers a couple of years ago when it started the Sohu project. This chip bakes in the transformer architecture into the hardware, thus allowing it to allocate more transistors to AI compute. We can liken this with processors and graphics cards — let’s say current AI chips are CPUs, which can do many different things, and then the transformer model is like the graphics demands of a game title. Sure, the CPU can still process these graphics demands, but it won’t do it as fast or as efficiently as a GPU. A GPU that’s specialized in processing visuals will make graphics rendering faster and more efficient, that’s because its hardware is specifically designed for that.
This is what Etched did with Sohu. Instead of making a chip that can accommodate every single AI architecture, it built one that only works with transformer models. When it started the project in 2022, ChatGPT didn’t even exist. But then it exploded in popularity in 2023, and the company’s gamble now looks like it is about to pay off — big time.
Nvidia is currently one of the most valuable companies in the world, posting record revenues ever since the demand for AI GPUs surged. It even shipped 3.76M data center GPUs in 2023, and this is trending to grow more this year. But Sohu’s launch could threaten Nvidia’s leadership in the AI space, especially if companies that exclusively use transformer models move to Sohu. After all, efficiency is the key to winning the AI race, and anyone who can run these models on the fastest, most affordable hardware will take the lead.
Ever since AI data centers started popping up left and right, many experts have raised their concerns over the power consumption crisis this power-hungry infrastructure will lead us to. Meta founder Mark Zuckerberg says electricity supply will constrain AI growth, and even the U.S. government has stepped in to discuss AI power demands. All the GPUs sold last year consume more power than 1.3 million homes, but if Etched’s approach to AI computing with Sohu takes off, we can perhaps reduce AI power demands to more manageable levels, allowing the electricity grid to catch up as our computing needs grow more sustainably.