Achieving Exascale Capabilities through Heterogeneous Computing

This article provides an overview of AMD's vision for exascale computing, and in particular, how heterogeneity will play a central role in realizing this vision. Exascale computing requires high levels of performance capabilities while staying within stringent power budgets. Using hardware optimized for specific functions is much more energy efficient than implementing those functions with general-purpose cores. However, there is a strong desire for supercomputer customers not to have to pay for custom components designed only for high-end high-performance computing systems. Therefore, high-volume GPU technology becomes a natural choice for energy-efficient data-parallel computing. To fully realize the GPU's capabilities, the authors envision exascale computing nodes that compose integrated CPUs and GPUs (that is, accelerated processing units), along with the hardware and software support to enable scientists to effectively run their scientific experiments on an exascale system. The authors discuss the hardware and software challenges in building a heterogeneous exascale system and describe ongoing research efforts at AMD to realize their exascale vision.

[1]  Natalie D. Enright Jerger,et al.  NoC Architectures for Silicon Interposer Systems: Why Pay for more Wires when you Can Get them (from your interposer) for Free? , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Sudhakar Yalamanchili,et al.  Cooperative boosting: needy versus greedy power management , 2013, ISCA.

[3]  David A. Wood,et al.  Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Jinsuk Chung,et al.  Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems , 2012, HiPC 2012.

[5]  Kevin Skadron,et al.  Real-world design and evaluation of compiler-managed GPU redundant multithreading , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[6]  Bo Fang,et al.  GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  David A. Wood,et al.  QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[8]  David Roberts,et al.  Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[9]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[10]  David A. Wood,et al.  Heterogeneous-race-free memory models , 2014, ASPLOS.

[11]  John Shalf,et al.  Memory Errors in Modern Systems: The Good, The Bad, and The Ugly , 2015, ASPLOS.