Google launches Ironwood, its first TPU to run its thinking AI models

At Google Cloud Next 25, Google Cloud has announced Ironwood, its seventh-generation Tensor Processing Unit (TPU), designed to be the company’s most powerful, scalable and energy-efficient custom AI accelerator to date. Google says that Ironwood represents an evolution, built to meet the evolving needs of generative AI and the increasing importance of inference in AI applications.
Google launches Ironwood, its first TPU to run its thinking AI models
At Google Cloud Next 25, Cloud has announced Ironwood, its seventh-generation Tensor Processing Unit (TPU), designed to be the company’s most powerful, scalable and energy-efficient custom AI accelerator to date. Notably, Ironwood is the first TPU specifically optimised for inference – the process where a trained model uses its learned knowledge to make predictions or draw conclusions from new, unseen data.
Google says that Ironwood represents an evolution, built to meet the evolving needs of generative AI and the increasing importance of inference in AI applications.
This involves AI agents that retrieve and generate data to deliver collaborative insights and answers, rather than just raw data. Google says that Ironwood is specifically designed to handle the computational and communication demands of the inference era.
Ironwood a key component of Google Cloud's AI Hypercomputer architecture
Google notes that the architecture of Ironwood is built for scale, supporting up to 9,216 liquid-cooled chips interconnected with a high-speed Inter-Chip Interconnect (ICI) network, spanning nearly 10 MW. Ironwood is a key component of Google Cloud's AI Hypercomputer architecture, which optimises hardware and software for AI workloads.
Developers can leverage Google's Pathways software stack to effectively harness the combined processing power of tens of thousands of Ironwood TPUs.

Key innovations in Ironwood


Google says Ironwood is engineered to efficiently manage the complex computation and communication required by “thinking models,” including Large Language Models (LLMs), Mixture of Experts (MoEs), and advanced reasoning tasks. These models demand massive parallel processing and high-speed memory access.
To support the distributed nature of these workloads, especially with models exceeding the capacity of a single chip, Ironwood incorporates a low-latency, high-bandwidth ICI network for synchronized communication across large TPU pods, Google explains.
Google Cloud will offer Ironwood in two configurations to suit varying workload requirements: a 256-chip configuration and a larger 9,216-chip configuration.
author
About the Author
TOI Tech Desk

The TOI Tech Desk is a dedicated team of journalists committed to delivering the latest and most relevant news from the world of technology to readers of The Times of India. TOI Tech Desk’s news coverage spans a wide spectrum across gadget launches, gadget reviews, trends, in-depth analysis, exclusive reports and breaking stories that impact technology and the digital universe. Be it how-tos or the latest happenings in AI, cybersecurity, personal gadgets, platforms like WhatsApp, Instagram, Facebook and more; TOI Tech Desk brings the news with accuracy and authenticity.

End of Article

Latest Mobiles

FOLLOW US ON SOCIAL MEDIA