
The HPC Asia 2009 keynote speech given by Dr. William T.C. Kramer introduces us to the NCSA beginning with its history in field of HPC and continuing on with its latest development, the Blue Waters project. The talk also examines the concepts behind the open science-based projects and applications that are slated to run over Blue Waters. Dr. Kramer then concludes with his insight into Petascale and Exascale era challenges of the future.
Part1 | Part2 | Part3 | Part4 | Part5 | Part6 | Part7 | Part8 [PDF Download] [Video]
Part 6:Exascale Era Challenges
So given our Terascale lessons, what are some of the challenges? We’ve talked a little bit already about scalable algorithms, that need to rely on increased parallelism. That’s not just the number of cores, that’s also how many instructions or how many operations can be generated for an application within each individual core. We’re seeing more instructions needing to be”in flight” in order to make sure that we get the optimal sustained performance out of that hardware. A move to strong-scaling rather than weak-scaling. In many of the areas that we talked about, we are able to increase scaling by increasing problem size but that’s becoming more and more limited in terms of sustained performance particularly as the cost of memory and the amount of memory per node becomes more and more limited. So in some systems you actually have to give up CPUs because you can’t fit the problem into memory with all the CPUs on it and, therefore, even if you have very high peak performance, you will not really be able to reap that benefit of those cores in those systems. So strong scaling is where the focus needs to be in order to make sure that you can make use of the full capabilities of the systems.
New programming methods. What I’m referring to here are the global address-based methods of programming models like UPC and Co-Array Fortran (CAF) and the underlying communication layers that allow those to work at high degrees of efficiency. Remote memory. interconnects that enable that to happen in a distributed memory system are also very important as a challenge. The new computing technology such as multi core and many core chips are coming and we have to be able to accommodate those in our application space as well as in the way we put the system together. And then there’s the ability to accommodate heterogeneous chips for the applications that can make use of those. The challenge there is how you make it easy enough for the application space to be able to use that. In other words, how do we automate the generation of that code rather than writing two or three different programs, one for each type of chip that’s in a system? And then finally there is enhancing reliability. There are indications that, at the system level, we can enhance that via software but we can also enhance that by understanding some things at the application level, that is, enhancing the resilience of the applications, being able to restart, being able to migrate during a failure or a potential failure.







