Google unveils eighth-generation TPUs as AI agents push cloud computing into a new phase
Google used its Cloud Next 2026 event on April 22 to introduce a new generation of custom AI chips designed around the demands of agentic workloads, signaling that the next phase of enterprise AI will be shaped as much by infrastructure economics as by model quality. The company said its first-party models are now processing more than 16 billion tokens per minute through direct API use, up from 10 billion in the previous quarter.
Google’s TPU 8t and TPU 8i target separate AI bottlenecks
The centerpiece of the announcement is Google’s eighth-generation Tensor Processing Unit family, split into two specialized chips: TPU 8t for training and TPU 8i for inference. Google said TPU 8t can scale to 9,600 chips and 2 petabytes of shared high-bandwidth memory in a single superpod, while TPU 8i is built to reduce latency for real-time workloads and to support concurrent agent execution at high throughput.
That division reflects a practical shift in AI infrastructure. Training frontier models still requires massive compute, but enterprise deployments increasingly hinge on how cheaply and quickly systems can answer users, route tasks, and run many agents at once. Google is positioning the new chips to address both sides of that equation.
Cloud Next frames AI agents as a core enterprise workload
The TPU launch came alongside Google’s broader pitch that AI agents are becoming a mainstream cloud workload rather than an experimental feature. The company said nearly 75% of Google Cloud customers are using its AI products, and it highlighted 330 customers that processed more than a trillion tokens each over the past 12 months.
Google also introduced its Gemini Enterprise Agent Platform, which it described as a way to build, scale, govern and optimize agents. In the same set of announcements, the company said it is expanding AI-powered security tooling and integrating new agentic solutions with its threat-detection and cloud security products.
The hardware bet is about margin, latency and control
For Google Cloud, custom silicon remains a strategic lever as AI usage rises. Chips built for specific workloads can lower operating costs, reduce dependence on outside accelerators, and give the company tighter control over performance for customers running large-scale inference and agent orchestration.
Google said its 2026 machine learning compute investment will be directed more than half toward the Cloud business, reflecting how central AI infrastructure has become to the company’s commercial plan. The TPUs will sit alongside NVIDIA GPU instances in Google Cloud, but the message from Cloud Next was clear: Google wants to make its own silicon a primary route for customers building the agentic stack.
The company said it will continue to roll out more details through its Cloud Next announcements and at Google I/O on May 19, but the immediate news is already concrete. Google is moving to tune its infrastructure for an AI market that is no longer just about model training, but about the cost, speed and reliability of putting those models to work at scale.
Source: Google Cloud Next 2026 news and updates; Sundar Pichai shares news from Google Cloud Next 2026
Date: 2026-04-22