The keynote speech given by Dr. Jack Dongarra at HPC Asia 2009 examines the history of high performance computing (HPC) from its beginnings in the 1950s, through the present, and into the very near future. The talk also takes an in-depth look at the TOP500 supercomputing list which was begun by Dr. Dongarra and several of his colleagues in 1993.
This extremely educational and enlightening talk also takes a look at current trends in HPC such as “many-core” chips and GPUs as well as examines future obstacles in the ongoing development of HPC.
Part1 | Part2 | Part3 | Part4 | Part5 | Part6 | Part7 [PDF Download] [Video]
Part 6ˇGFuture Trends: Gates, Cores, Cycle Times, Many-Core Chips, and GPUs
Recently, the computer architects have been simultaneously working on two things to increase the performance. The first thing that they’ve been doing is increasing the number of gates on the chip and, at the same time, they’ve also been able to increase the cycle time. Those two things in combination have been giving us incredible advances every time a new chip comes out. Now what we have here is a situation where the number of gates on the chip will continue to double every 18 to 24 months. That’s what my architect friends tell me. That’s going to continue to happen. However, the cycle time is not going to improve, so the cycle time for our new chips will remain constant from this point on. In fact, it may even go down a bit.
So if we take a look at this curve here, the green line represents the number of gates that we have on our chips and that’s going to continue to rise at an exponential rate. The blue line is what the cycle time has been doing. It’s been going up at an exponential rate also,but starting roughly last year and going forward, that cycle time is not going to be enhanced further. In fact,there will probably be a slight decrease in the cycle time. That’s gonna have a tremendous impact in terms of the new products that come out. So what that means is that when a new chip comes out, it’s gonna have the same cycle time. So if you’re running on one core of the old chip and one core of the new chip, your application’s gonna run at exactly the same speed. It’s not gonna be enhanced by the new processors because the cycle time will not have changed. You’ll see an enhancement only if you can benefit from the additional cores that are going to be placed on the chip because of the additional gates that will be there because you will have a doubling of the gates on the chip. Again, that’s gonna have an incredible impact in terms of applications. Now, in scientific computing, we “get it” meaning we understand this. We’ve been doing parallel processing for a long time. But this affects everybody; all computers; this affects everything in supercomputing.
For example, today we have companies that are making software that are terrified by the advent of multi core. They have to change all their software over. Think about Microsoft, Adobe, and MathWorks. Think about all the software companies that have to change all of their software, otherwise, when the new chips come out, their software will not realize the same types of enhancements that they’ve seen in the past. That’s gonna be a big deal. That hardware issue, that hardware solution that the computer architectures have given us, that is, doubling of the number of gates on the chip--that just became a software problem that we will have to somehow overcome now.
So what’s actually going on here? I mentioned that the frequency is not going to change anymore. From here on out, it’s going to remain the roughly constant. Considering that, if we take a look at the power that’s consumed by our processors, it looks something like the Voltage2 x Frequency (V2F) where the frequency is proportional to the voltage. That means that the power goes something like the frequency cubed. You can see the problem. If I make a change to the frequency, it has a dramatic impact in terms of the power. If I think about the frequency as being increased by a factor of two in terms of the speed of these chips, the power requirements go up by a factor of eight! Today our chips are running at 100 W. If I think about 100 W in the area of your fingernail, that is, roughly the size of the chip, that’s an incredible amount of heat that you have to remove! Now think about making that eight times hotter! That becomes a challenge that will be hard to overcome!
Think about a machine that has one core, a voltage of one, a frequency of one, giving as a performance of one, and a power requirement of one. Now think about making a change in the frequency; making it run 50% faster. If we do that, that will require us to have a 3.3 increase in the power going into the chip. That’s the problem right there. Making a change in the frequency has a dramatic impact in terms of power. What happens with multi core? With multi core, I can give you two cores on the same chip, reduce the frequency, say, run the frequency at 75% of what we had before and, if you can use those two cores, then the performance goes up by 1.5 while the power actually goes down by .8 . So, multi core has been able to give us, in this context, 50% more performance and 20% less power. That’s the motivating reason for it. But again, we have to use those cores in order to see the advantage of that equation.
Where are we going in the future? When thinking about multi core, don’t think about two or four cores, rather, think much larger. This is gonna be big! AMD has a six core processor coming out soon. We’re gonna go to eight and even 16 cores in the near future. So we’re going to see many cores on a chip. And those cores are going to be distinct. Some will have some characteristics which are based on conventional processors and some will be using graphical processors on chip. Some cores will be used for floating points. Maybe some cores will be used for integers. We’ll see chips come out that are targeted for different areas--maybe some for the home, some for gaming, maybe for business computing, and maybe for scientific computing which has a different ratio of those conventional graphics and floating point chips.
One of the problems with chips, however, is getting data into the chip. We have chips that have pins on the perimeter and now we’re talking about increasing the number of computational devices in that area so we have a limitation as to how we get data into this device. That becomes a big bottleneck in terms of computing. The architects at Intel, AMD, and IBM are thinking about stacking memory, that is, not having the data come from the perimeter, but having the data come from a memory that’s placed on top of the processors so the data can move in a vertical direction into the processor cores, thus, giving more access and more bandwidth, ultimately, and more data going into the processors. So that something that’s coming into the near future also.
The other thing that’s happening is this issue with GPUs. Nvidia has a very impressive GPU in use today which is really quite low cost. The GPUs that we see today are in the range of a TFlop of peak performance for 32 bit arithmetic. For 64 bit arithmetic, were seeing about 100 Gigaflops in terms of the performance. Those are very impressive numbers! But the thing is, these are attached processors meaning that they are attached to the main CPU. What this means is that we have to communicate over a rather slow vehicle to move data from the processors and over to the GPU where the computation takes place. We would like to have a more tightly integrated system. Intel has announced its chip in this area called Larrabee. Larrabee be is going to be a “many-core” chip where there will be some conventional and some graphical cores on the same chip. This will perhaps give us a better balance between data movement on the chip and better performance ratios when using this chip but that’s yet to be seen. We’ll have to wait and see if that happens.
We have a number of devices that we can look at in terms of increasing performance. We can think about multi core chips, Cell processors, GPUs, and field programmable Gatorade’s (FPGA). We can think about speed improvement. Multi core doesn’t give us great speed improvements, but we can see potential speed improvements from some of the other devices we talked about. In terms of ease of programming, the multi core chip is much easier to program than some of the others. The availability of the numerical libraries, the affordability, and the power consumption all go into it. It’s a very complex space and the real solution is gonna depend on your application and what your purpose is in terms of computing. If you want general purpose or rather special purpose, there’s a mixture of things that could work.
In the future, we will see a many-core chip, and that chip will have a different combination of cores-- some conventional, some graphical, and some floating point cores. And you place that chip on a board, thus, making a shared memory node. We have multiple sockets that are composed on that board that are sharing a common pool memory. Then we will take that board and put it into a rack, thus, making a cluster. Someone will then take that rack and fill up a room making a very large supercomputer. Then you have to start thinking about millions of cores. That number, millions of cores, may seem to be something of a joke, but today we have machines which have hundreds of thousands of cores. So having a machine that uses millions of cores is just on the horizon.