The keynote speech given by Dr. Jack Dongarra at HPC Asia 2009 examines the history of high performance computing (HPC) from its beginnings in the 1950s, through the present, and into the very near future. The talk also takes an in-depth look at the TOP500 supercomputing list which was begun by Dr. Dongarra and several of his colleagues in 1993.
This extremely educational and enlightening talk also takes a look at current trends in HPC such as “many-core” chips and GPUs as well as examines future obstacles in the ongoing development of HPC.
Part1 | Part2 | Part3 | Part4 | Part5 | Part6 | Part7 [PDF Download] [Video]
Part 5ˇGPOWER vs. Efficiency (a.k.a. “Flops Per Watt”)
One of the most important things today with high performance computers is their power requirements. What I’m talking about here is the amount of power or electricity that’s necessary to run these machines. What we’re looking at here is the amount of power that’s needed to run these systems under load, for example, when they are running the benchmark. The #1 machine takes about 2.5 MW of power. And you can look down the list and see how things change. Unfortunately, we don’t have the numbers for the Shanghai Supercomputing Center. The numbers are quite striking here. The machine at Oakridge National Laboratory uses about 7 MW of power.
This last column here tries to reflect what the machine’s efficiency is, that is to say, its“flops per watt.” You want your machine to have a lot of “flops per watt.” The highest one on the top 10 is the Road Runner machine at Los Alamos. I’ll describe its architecture in a moment but, just to give you one reason why it is such a fast machine, it uses the IBM Cell processor which was originally designed for video games. It’s a heterogeneous computer that has three very distinct computer architectures, meaning instruction sets, built into the machine. It is composed of an AMD dual core chip together with the IBM Cell processor. The way the architecture works is that each core of the AMD processor gets an IBM Cell chip. It uses the chip is an accelerator, of course, so think of it as an attached processor. There are 122,000 cores inside of the machine total. When you consider programming this machine, you have to write a program for each of those architectures. So when you write a program for an application, you need to write a program for the AMD processor, one for the Power PC processor (which is one of the cores of the cell processor), and then another for the eight vector processors associated with the cell vector architecture. You need to write three programs in order to make this machine run. That’s an incredible challenge to the application writer! Not just one program--you have to write three programs and you have to pass data between each of those in stages to push data over to the vector processors and then back again out to the main memory of the system! So it is a very challenging architecture to program but it’s a very very high performance machine indeed!
At Oak Ridge National Lab where I work, we have the machine at position #2. It’s a Cray machine based on AMD’s quad core Opteron processor. Today it has 181,000 cores in it. It has a peak performance of 1.6 TFlops but for the benchmark, it came in at #2. This is what I consider a much more “general purpose” machine. I only have to write one program for this machine. Just one architecture for this machine--the AMD processor; not three. From an applications and productivity standpoint, it may be a more productive machine to use. At the University of Tennessee where I teach, we also have another fast supercomputer. By the way, University of Tennessee just became one of the NSF supercomputer centers. The machine we have at the University is also a Cray-based system. Today that system is #15 on the list. Soon it will rise to over a PFlop and hopefully will be the fastest academic machine on the list very soon.
Power is an incredibly important aspect of computing that affects everything. For example, take a look at where power is being used in terms of some of the data centers. For example, Google recently opened a data center. That data center was opened at a place that was very strategic in terms of its placement. When planning this data center, Google purchased an old aluminum smelting plant where they used to manufacture aluminum. The aluminum plant already has a ready source of power. That plant was also located next to the Columbia River. The Columbia River was there to provide cooling for the system. They had an easy source of power, an easy source of cooling, and relatively cheap electricity as well. Google placed the plant there in 2006. Recently, both Yahoo and Microsoft placed their data centers upriver from the Google plant in the state of Washington, again, for the same reasons. The Microsoft data center is an incredible facility! They have 47 MW of power going into this facility! Just An incredible place! Also, The Microsoft facility is based on containers. They are able to roll in racks and racks of PCs which make up the data center. They are also able to provide outsourcing services with these racks.
At Oak Ridge National Lab, there’s a plan to put in place more and more high performance systems, but that comes at a cost, and thatcost is power. We estimate our costs to be about USD $11,000,000 for the heating and cooling of our computers at the lab this year. The projection is that by 2012, that price will increase to USD $32,000,000 because of the additional computers we will put into place. Clearly you see that the cost of these machines in terms of their power consumption, even over a relatively short period of time, is going to be equal to the price of one of the individual supercomputers itself! Some of these machines go for about USD $100,000,000. You can think of that as the entry point for some of these larger machines. HPC and computing in general has been going through this incredible situation in terms of their power consumption.