Back at I/O in May, we announced Trillium, the sixth generation of our very own custom-designed chip known as the Tensor Processing Unit, or TPU — and today, we announced that it’s now available to Google Cloud Customers in preview. TPUs are what power the AI that makes your Google devices and apps as helpful as possible, and Trillium is the most powerful and sustainable TPU yet.
But what exactly is a TPU? And what makes Trillium “custom”? To really understand what makes Trillium so special, it’s important to learn not only about TPUs, but also other types of compute processors — CPUs and GPUs — as well as what makes them different. As a product manager who works on AI infrastructure at Google Cloud, Chelsie Czop knows exactly how to break it all down. “I work across multiple teams to make sure our platforms are as efficient as possible for our customers who are building AI products,” she says. And what makes a lot of Google’s AI products possible, Chelsie says, are Google’s TPUs.
Let’s start with the basics! What are CPUs, GPUs and TPUs?
These are all chips that work as processors for compute tasks. Think of your brain as a computer that can do things like reading a book or doing a math problem. Each of those activities is similar to a compute task. So if you use your phone to take a picture, send a text or open an application, your phone’s brain, or processor, is doing those compute tasks.
What do the different acronyms stand for?
Even though CPUs, GPUs and TPUs are all processors, they’re progressively more specialized. CPU stands for Central Processing Unit. These are general-purpose chips that can handle a diverse range of tasks. Similar to your brain, some tasks may take longer if the CPU isn’t specialized in that area.
Then there’s the GPU, or Graphics Processing Unit. GPUs have become the workhorse of accelerated compute tasks, from graphic rendering to AI workloads. They’re what’s known as a type of ASIC, or application-specific integrated circuit. Integrated circuits are generally made using silicon, so you might hear people refer to chips as “silicon” — they’re the same thing (and yes, that’s where the term “Silicon Valley” comes from!). In short, ASICs are designed for a single, specific purpose.
The TPU, or Tensor Processing Unit, is Google’s own ASIC. We designed TPUs from the ground up to run AI-based compute tasks, making them even more specialized than CPUs and GPUs. TPUs have been at the heart of some of Google’s most popular AI services, including Search, YouTube and DeepMind’s large language models.
Got it, so all of these chips are what make our devices work. Where would I find CPUs, GPUs and TPUs?
CPUs and GPUs are inside very familiar items you probably use every day: You’ll find CPUs in just about every smartphone, and they’re in personal computing devices like laptops, too. A GPU you’ll find in high-end gaming systems or some desktop devices. TPUs you’ll only find in Google data centers: warehouse-style buildings full of racks and racks of TPUs, humming along 24/7 to keep Google’s, and our Cloud customers’, AI services running worldwide.
What made Google start thinking about creating TPUs?
CPUs were invented in the late 1950s, and GPUs came around in the late ‘90s. And then here at Google, we started thinking about TPUs about 10 years ago. Our speech recognition services were getting much better in quality, and we realized that if every user started “talking” to Google for just three minutes a day, we would need to double the number of computers in our data centers. We knew we needed something that was a lot more efficient than off-the-shelf hardware that was available at the time — and we knew we were going to need a lot more processing power out of each chip. So, we built our own!
And that “T” stands for Tensor, right? Why?
Yep — a “tensor” is the generic name for the data structures used for machine learning. Basically, there’s a bunch of math happening under the hood to make AI tasks possible. With our latest TPU, Trillium, we’ve increased the amount of calculations that can happen: Trillium has 4.7x peak compute performance per chip compared to the prior generation, TPU v5e.
What does that mean, exactly?
It basically means that Trillium is able to work on all the calculations required to run that complex math 4.7 times faster than the last version. Not only does Trillium work faster, it can also handle larger, more complicated workloads.
Is there anything else that makes it an improvement over our last-gen TPU?
Another thing that’s better about Trillium is that it’s our most sustainable TPU yet — in fact, it’s 67% more energy-efficient than our last TPU. As the demand for AI continues to soar, the industry needs to scale infrastructure sustainably. Trillium essentially uses less power to do the same work.
Now that customers are starting to use it, what kind of impact do you think Trillium will have?
We’re already seeing some pretty incredible developments powered by Trillium! We have customers using it in technologies that analyze RNA for various diseases, turn written text into videos at incredible speeds and more. And that’s just from our very initial round of users — now that Trillium’s in preview, we can’t wait to see what people can do with it.