
Barclays forecasts that chip-related capital expenditure for consumer AI inference alone is expected to approach $120 billion in 2026 and exceed $1.1 trillion by 2028.
Barclays also noted that LLM providers, such as OpenAI, are being forced to look at custom chips, mainly ASICS, instead of GPUs, to reduce the cost of inference to move toward profitability.
The case for Google TPUs
Inference consumes over 50% of OpenAI’s compute budget, and TPUs, specifically older ones, offer significantly lower cost-per-inference compared to Nvidia GPUs, Dai said, explaining the significance of TPUs for OpenAI.
“While older TPUs lack the peak performance of newer Nvidia chips, their dedicated architecture minimizes energy waste and idle resources, making them more cost-effective at scale,” Dai added.
Omdia principal analyst Alexander Harrowell also agreed with Dai.
“…a lot of AI practitioners will tell you they get (from TPUs) a better ratio of floating-point operations per second (FLOPS) — a unit of measuring computational performance — utilized to theoretical maximum performance than they do with anything else,” Harrowell said.