As cloud datacenters grow ever larger and complex, providers are increasingly developing their own chips to eke out performance, efficiency, and cost savings over their competitors.
Today, the top cloud providers employ a suite of in-house silicon for everything from general compute to networking, storage, and AI training and inference. According to Dell’Oro analyst Baron Fung, this trend is likely to accelerate over the next few years as cloud providers and hyperscalers look to diversify their supply chains.
It’s hard not to talk about cloud silicon development without pointing to AWS, for which chip development has become a core component of its business, with its Graviton CPUs estimated to power one in five cloud instances on EC2. However, they’re not alone.
Earlier this summer, Google launched its fifth-generation of AI/ML accelerators, which it calls a Tensor Processing Unit (TPU). Meanwhile, in China, Alibaba, Baidu, and Tencent are working on all manner of custom silicon from AI acceleration to data processing and even Arm CPUs. And last we heard, Microsoft was looking to hire a couple of electrical engineers to develop custom datacenter chips of its own, potentially to compete with AWS Graviton.
A lot of cloud silicon may as well be invisible
But while chips like Graviton are a prime example of just how far hyperscalers are willing to go to optimize their compute infrastructure, the chip is something of an outlier. The majority of custom chips developed by the major cloud providers are designed for internal use or are entirely transparent from a customer perspective.
Data processing units (DPU) and smartNICs are a prime example. Nearly every cloud provider and hyperscaler on the market has developed some kind of custom NIC for their servers to offload IO processes.
AWS has its Nitro cards; Google has commissioned specialized smartNICs from Intel; custom smartNICs power Microsoft’s Azure Accelerated Networking stack, and the company acquired DPU startup Fungible in January. The core value proposition of these devices is preventing storage, networking, and security services from taking CPU cycles away from tenant workloads.
In some cases, customer-facing features like high-speed storage networks or cryptographic offload — AWS’s Nitro TPM for example — may be tied to instances backed by these cards. However, for the most part, the work these chips do is largely invisible to the end user.
It’s a similar, albeit evolving, situation when you start talking about custom accelerators for things like AI/ML. Both Google and AWS have been building custom AI accelerators for training and inference workloads for years now. Google’s TPU, AWS’s Trainium and Inferentia, and Baidu’s Kunlun AI are just a few examples.
And while customers can spin up jobs on these chips, they tend to be optimized for the cloud provider’s internal workloads first, Fung said.
While training custom LLMs to power ChatGPT-style chatbots like Google Bard or Bing Chat is all the rage right now, cloud providers have been leveraging machine-learning functionality, like recommender engines and natural language processing for years now.
“We have a lot of internal properties, like Alexa’s voice synthesis runs on Inferentia; the search you do on amazon.com, that actually runs on Inferentia; the recommendations that you’ll see on amazon.com for products and things you might be interested in, that runs on Inferentia,” Chetan Kapoo director of product management for Amazon EC2, told The Register.
Custom silicon won’t replace commercial chips
For custom silicon, whether it’s a general purpose processor like AWS’s Graviton or a purpose-built ML chip, like Google’s TPU, to make economic sense requires a degree of scale only really seen among the largest cloud providers and hyperscalers, Fung explained.
“For some of the more general purpose equipment like the CPUs and NICs, it makes sense to build your own when they meet a certain volume threshold,” he said.
Fung puts that tipping point at somewhere around a million units a year. However, he notes that for niche compute products, cloud providers may be motivated more by supply chains and a desire to diversify their hardware stack.
In other words, custom silicon offers cloud providers a means to hedge their bets, especially in markets dominated by a single vendor like Nvidia. Looking ahead, “I think we’ll see more custom accelerator deployments,” Fung said.
However, he doesn’t expect cloud silicon will displace chipmakers like Intel, AMD, or Nvidia. Nvidia has built up a lot of momentum around its GPUs thanks in no small part to a robust software ecosystem. Because of this, the majority of large language models today run on Nvidia hardware.
So it’s no surprise that cloud providers aren’t just investing in their own chips, but buying massive quantities of Nvidia’s A100s and H100s. Google plans to deploy something like 6,569 H100 GPUs to power its A3 AI supercomputer, which will eventually scale to 26 exaFLOPS of what we assume to be FP8 performance.
Microsoft, meanwhile, is deploying “tens of thousands of Nvidia A100 and H100 GPUs” to power its AI services. And Meta’s AI research facility employs 16,000 Nvidia A100s, and the Social Network is reportedly purchasing massive quantities of H100s for use in its Grand Teton server platform.
With that said, Kapoor tells us demand for generative AI hardware is driving considerable growth for AWS’s custom accelerators. “We’re starting to see a similar interest from customers that are using large-scale compute today and are excited about the prospect of lowering their cost or getting access to compute capacity in general,” he said of Trainium.
The future of cloud silicon
Looking to the future, Fung expects a variety of factors to drive cloud silicon development, ranging from power and space constraints, AI demand, and geopolitical issues.
In the US, Fung anticipates a lot of the development will focus around AI accelerators, mostly among the largest hyperscalers and cloud providers. He expects smaller players will likely stick to commercial silicon from the major chipmakers.
Fung doesn’t expect to see much competition for AWS’s Graviton CPUs coming from cloud providers. “There’s always rumors about hyperscalers developing their own Arm CPUs, but there are alternatives right now,” he said, pointing to readily available Arm CPUs from Ampere and potential developments from Qualcomm and Marvell. The CPU market is far more diverse than it was when AWS debuted Graviton in 2018.
The only exception would be in China, where geopolitical pressure from the West has driven many large cloud providers and webscalers to develop all manner of custom silicon for fear of being cut off from US-developed chips. “We’ll likely see more custom Arm deployment in China in particular,” he said.
Last fall, we learned that Alibaba Cloud planned to transition about a fifth of its systems over to its in-house Yitian 710 processor. Announced back in 2021, the chip has 128 Armv9 cores, a clock speed of 3.2GHz and DDR5 support and 96 PCIe 5.0 lanes.
However, as Arm recently noted in its IPO filing, there’s the distinct possibility that US regulations could further limit its ability to do business in the Middle Kingdom. The company is already barred from licensing its top-specced Neoverse datacenter cores in the country.
China is well aware of this possibility. In December, the Chinese government reportedly tapped Alibaba and Tencent to design sanction-proof RISC-V chips. ®