IBM claims it can cut the cost for AI models in the cloud with custom silicon to cash in on the surge of interest in generative models like ChatGPT.
Tech companies have been desperate to exploit the new interest in AI caused by ChatGPT, even though that may now be on the wane, with traffic to OpenAI’s website falling by an estimated 10 percent between May and June.
IBM said it is considering the use of its own custom AI chips to lower the costs of operating its Watsonx services in the cloud.
Watsonx, announced in May, is actually a suite of three products designed for enterprise customers trying out foundation models and generative AI to automate or accelerate workloads. These can run on multiple public clouds, as well as on-premises.
IBM’s Mukesh Khare told Reuters that the company is now looking to use a chip called the Artificial Intelligence Unit (AIU) as part of its Watsonx services operating on IBM Cloud. He blamed the failure of Big Blue’s old Watson system on high costs, and claimed that by using its AIU, the company can lower the cost of AI processing in the cloud because the chips are power efficient.
Unveiled last October, the AIU is an application-specific integrated circuit (ASIC) featuring 32 processing cores, and described by IBM as a version of the AI accelerator built into the Telum chip that powers the z16 mainframe. It fits into a PCIe slot in any computer or server.
Amazon aims to cut costs
Meanwhile, Amazon said it is also looking to attract more customers to its AWS cloud platform by competing on price, claiming it can offer lower costs for training and operating models.
The cloud giant’s veep of AWS Applications Dilip Kumar said that AI models behind services such as ChatGPT require considerable amounts of compute power to train and operate, and that these are kinds of costs Amazon Web Services (AWS) is historically good at lowering.
According to some estimates, ChatGPT may have used over 570GB worth of datasets for training, and required over 1,000 of Nvidia’s A100 GPUs to handle the processing.
Kumar commented at the Momentum conference in Austin that the latest generation of AI models are expensive to train for this reason, adding “We’re taking on a lot of that undifferentiated heavy lifting, so as to be able to lower the cost for our customers.”
Plenty of organizations already have their data already stored in AWS, Kumar opined, making this a good reason to choose Amazon’s AI services. This is especially so when customers may be hit with egress charges to move their data anywhere else.
However, cloud providers may not be ready to meet the new demand for AI services, according to some experts. The Wall Street Journal notes that the new breed of generative AI models can be anything from 10 to 100 times bigger than older versions, and needs infrastructure backed by accelerators such as GPUs to speed processing.
Only a small proportion of the bit barns operated by public cloud providers are made up of high-performance nodes fitted with such accelerators that can be assigned to AI processing tasks, Amazon’s director of product management for EC2 at AWS Chetan Kapoor declared, saying that there is “a pretty big imbalance between demand and supply”.
This hasn’t stopped cloud companies from expanding their AI offerings. Kapoor said that AWS intends to expand its AI-optimized server clusters over the next year, while Microsoft’s Azure and Google Cloud are also said to be increasing their AI infrastructure.
Microsoft also announced a partnership with GPU maker Nvidia last year, which essentially involved integrating tens of thousands of Nvidia’s A100 and H100 GPUs into Azure to power GPU-based server instances, as well as Nvidia’s AI software stack.
Meanwhile, VMware is also looking to get in on the act, announcing plans this week to enable generative AI to run on its platform, making it easier for customers to operate large language models efficiently in a VMware environment, potentially using resources housed across multiple clouds. ®