Deploying high-performance, energy-efficient AI | MIT Technology Review

Deploying high performance energy efficient AI MIT Technology Review | itkovian

Zane: Yes, I think over the last three or four years, there’ve been a number of initiatives. Intel’s played a big part of this as well of re-imagining how servers are engineered into modular components. And really modularity for servers is just exactly as it sounds. We break different subsystems of the server down into some standard building blocks, define some interfaces between those standard building blocks so that they can work together. And that has a number of advantages. Number one, from a sustainability point of view, it lowers the embodied carbon of those hardware components. Some of these hardware components are quite complex and very energy intensive to manufacture. So imagine a 30 layer circuit board, for example, is a pretty carbon intensive piece of hardware. I don’t want the entire system, if only a small part of it needs that kind of complexity. I can just pay the price of the complexity where I need it.

And by being intelligent about how we break up the design in different pieces, we bring that embodied carbon footprint down. The reuse of pieces also becomes possible. So when we upgrade a system, maybe to a new telemetry approach or a new security technology, there’s just a small circuit board that has to be replaced versus replacing the whole system. Or maybe a new microprocessor comes out and the processor module can be replaced without investing in new power supplies, new chassis, new everything. And so that circularity and reuse becomes a significant opportunity. And so that embodied carbon aspect, which is about 10% of carbon footprint in these data centers can be significantly improved. And another benefit of the modularity, aside from the sustainability, is it just brings R&D investment down. So if I’m going to develop a hundred different kinds of servers, if I can build those servers based on the very same building blocks just configured differently, I’m going to have to invest less money, less time. And that is a real driver of the move towards modularity as well.

Laurel: So what are some of those techniques and technologies like liquid cooling and ultrahigh dense compute that large enterprises can use to compute more efficiently? And what are their effects on water consumption, energy use, and overall performance as you were outlining earlier as well?

Zane: Yeah, those are two I think very important opportunities. And let’s just take them one at a  time. Emerging AI world, I think liquid cooling is probably one of the most important low hanging fruit opportunities. So in an air cooled data center, a tremendous amount of energy goes into fans and chillers and evaporative cooling systems. And that is actually a significant part. So if you move a data center to a fully liquid cooled solution, this is an opportunity of around 30% of energy consumption, which is sort of a wow number. I think people are often surprised just how much energy is burned. And if you walk into a data center, you almost need ear protection because it’s so loud and the hotter the components get, the higher the fan speeds get, and the more energy is being burned in the cooling side and liquid cooling takes a lot of that off the table.

What offsets that is liquid cooling is a bit complex. Not everyone is fully able to utilize it. There’s more upfront costs, but actually it saves money in the long run. So the total cost of ownership with liquid cooling is very favorable, and as we’re engineering new data centers from the ground up. Liquid cooling is a really exciting opportunity and I think the faster we can move to liquid cooling, the more energy that we can save. But it’s a complicated world out there. There’s a lot of different situations, a lot of different infrastructures to design around. So we shouldn’t trivialize how hard that is for an individual enterprise. One of the other benefits of liquid cooling is we get out of the business of evaporating water for cooling. A lot of North America data centers are in arid regions and use large quantities of water for evaporative cooling.

That is good from an energy consumption point of view, but the water consumption can be really extraordinary. I’ve seen numbers getting close to a trillion gallons of water per year in North America data centers alone. And then in humid climates like in Southeast Asia or eastern China for example, that evaporative cooling capability is not as effective and so much more energy is burned. And so if you really want to get to really aggressive energy efficiency numbers, you just can’t do it with evaporative cooling in those humid climates. And so those geographies are kind of the tip of the spear for moving into liquid cooling.

The other opportunity you mentioned was density and bringing higher and higher density of computing has been the trend for decades. That is effectively what Moore’s Law has been pushing us forward. And I think it’s just important to realize that’s not done yet. As much as we think about racks of GPUs and accelerators, we can still significantly improve energy consumption with higher and higher density traditional servers that allows us to pack what might’ve been a whole row of racks into a single rack of computing in the future. And those are substantial savings. And at Intel, we’ve announced we have an upcoming processor that has 288 CPU cores and 288 cores in a single package enables us to build racks with as many as 11,000 CPU cores. So the energy savings there is substantial, not just because those chips are very, very efficient, but because the amount of networking equipment and ancillary things around those systems is a lot less because you’re using those resources more efficiently with those very high dense components. So continuing, if perhaps even accelerating our path to this ultra-high dense kind of computing is going to help us get to the energy savings we need maybe to accommodate some of those larger models that are coming.

Laurel: Yeah, that definitely makes sense. And this is a good segue into this other part of it, which is how data centers and hardware as well software can collaborate to create greater energy efficient technology without compromising function. So how can enterprises invest in more energy efficient hardware such as hardware-aware software, and as you were mentioning earlier, large language models or LLMs with smaller downsized infrastructure but still reap the benefits of AI?

Zane: I think there are a lot of opportunities, and maybe the most exciting one that I see right now is that even as we’re pretty wowed and blown away by what these really large models are able to do, even though they require tens of megawatts of super compute power to do, you can actually get a lot of those benefits with far smaller models as long as you’re content to operate them within some specific knowledge domain. So we’ve often referred to these as expert models. So take for example an open source model like the Llama 2 that Meta produced. So there’s like a 7 billion parameter version of that model. There’s also, I think, a 13 and 70 billion parameter versions of that model compared to a GPT-4, maybe something like a trillion element model. So it’s far, far, far smaller, but when you fine tune that model with data to a specific use case, so if you’re an enterprise, you’re probably working on something fairly narrow and specific that you’re trying to do.

Hi, I’m Samuel