At Google Cloud Next 25, Cloud has announced Ironwood, its seventh-generation Tensor Processing Unit (TPU), designed to be the company’s most powerful, scalable and energy-efficient custom AI accelerator to date. Notably, Ironwood is the first TPU specifically optimised for inference – the process where a trained model uses its learned knowledge to make predictions or draw conclusions from new, unseen data.
Google says that Ironwood represents an evolution, built to meet the evolving needs of generative AI and the increasing importance of inference in AI applications.
This involves AI agents that retrieve and generate data to deliver collaborative insights and answers, rather than just raw data. Google says that Ironwood is specifically designed to handle the computational and communication demands of the inference era.
Ironwood a key component of Google Cloud's AI Hypercomputer architecture
Google notes that the architecture of Ironwood is built for scale, supporting up to 9,216 liquid-cooled chips interconnected with a high-speed Inter-Chip Interconnect (ICI) network, spanning nearly 10 MW. Ironwood is a key component of Google Cloud's AI Hypercomputer architecture, which optimises hardware and software for AI workloads.
Developers can leverage Google's Pathways software stack to effectively harness the combined processing power of tens of thousands of Ironwood TPUs.
Key innovations in Ironwood
Google says Ironwood is engineered to efficiently manage the complex computation and communication required by “thinking models,” including Large Language Models (LLMs), Mixture of Experts (MoEs), and advanced reasoning tasks. These models demand massive parallel processing and high-speed memory access.
To support the distributed nature of these workloads, especially with models exceeding the capacity of a single chip, Ironwood incorporates a low-latency, high-bandwidth ICI network for synchronized communication across large TPU pods, Google explains.
Google Cloud will offer Ironwood in two configurations to suit varying workload requirements: a 256-chip configuration and a larger 9,216-chip configuration.