The keynote speech given by Dr. Jack Dongarra at HPC Asia 2009 examines the history of high performance computing (HPC) from its beginnings in the 1950s, through the present, and into the very near future. The talk also takes an in-depth look at the TOP500 supercomputing list which was begun by Dr. Dongarra and several of his colleagues in 1993.
This extremely educational and enlightening talk also takes a look at current trends in HPC such as “many-core” chips and GPUs as well as examines future obstacles in the ongoing development of HPC.
Part1 | Part2 | Part3 | Part4 | Part5 | Part6 | Part7 [PDF Download] [Video]
When I look at programming these systems from an applications standpoint, I see five important aspects that the programmer has to deal with respect to programming a many-core chip. The first thing that we have to deal with is a dynamic data driven execution. We can’t have the kind of programming model that we’ve had in the past. A fork-joined programming model for the new architectures will be very inefficient so we have to go to a much more data driven, almost a data flow-like execution, of our applications. Perhaps even new things in and out of order sequence. Mixed precision is going to be a very important aspect also. I see a factor of two in performance for our conventional processors. Intel and AMD processors today have a factor of two between 32 and 64 bit. For graphics processors, there’s an order of magnitude difference between single and double precision. To use the systems, we will have to develop techniques that are “self-adapting.” These systems are complicated and we can’t really expect the user to understand all the details and get it right. So we will have to embed some of the intelligence into the software and then have the software “tune” itself into the environment that it’s going to run in—an auto-tuning system.
The other issue is about fault tolerance. We have systems with hundreds of thousands of processors and that’s going to go up to a million soon and we really have no way to address the failure of a processor or a core or a thread of execution. Our programming model today is MPI which has no mechanism for recovering from failures. When a failure occurs our applications fall over. We need to start thinking about mechanisms so we can recover from failure. And the final point I want to make is about communication avoidance which means coming up with approaches and algorithms which try to minimize communication (due to the complexities involved and the issues of timing) between the various components of machine.
So we have Petascale computing today and, in the short term, we’ll be at Exascale, that is, 1018 floating point operations per second, and that’s likely to become a reality around 2017. That system will have between 10,000,000 and 100,000,000 processing elements. There will probably be 3D packaging associated with that machine stacking memory. There will be an optically-based interconnect to move data around within the system. It will have somewhere between 10 and 100 Petabytes of primary memory and it will have optical I/O channels to move data around at a certain rate. There will have to be hardware and software-based fault tolerances built into the system as well. One of the limiting factors in building these machines is going to be the amount of electricity that we need for the heating and cooling of these systems.
In conclusion, let me say that we need to re-interpret Moore’s law. It’s not the doubling of circuits on the chip, rather, we need to think about doubling the number of threads of execution, something like doubling every two years. Also, we need to be prepared for that in our software environment as well. We will need to be able to deal with these 1,000,000 way concurrent threads of execution and the programming models that we have today are inadequate. Today we use programming they came from the 60s in a style like C or FORTRAN which uses an assembly language like a MPI to move messages around. It’s a very crude system and is not up to the task of making us as productive as we will need to be when we reach these levels of computing. And again, power is going to be one of the limiting factors that will become an architectural driver as far as us being able to really build these Exascale systems. So let me conclude there and thank my collaborators. If you’re interested in finding out more about the things I’ve talked about you can go to my website. The easiest way to find my website is to go to Google and type in my name and then click on “I feel lucky.” Thank you very much for your attention.
Future Trends: Gates, Cores, Cycle Times, Many-Core Chips, and GPUs