MilkyWay-2 supercomputer: system and application

On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16-core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

[1]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[2]  Sunil Sherlekar Tutorial: Intel many integrated core (MIC) architecture , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[3]  Kai Lu,et al.  The TianHe-1A Supercomputer: Its Hardware and Software , 2011, Journal of Computer Science and Technology.

[4]  I. Kuntz,et al.  DOCK 6: combining techniques to model RNA-small molecule complexes. , 2009, RNA.

[5]  Seyong Lee,et al.  Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Sunil D. Sherlekar Intel Many Integrated Core (MIC) Architecture. , 2012, ICPADS 2012.

[7]  Chao Yang,et al.  A peta-scalable CPU-GPU algorithm for global atmospheric simulations , 2013, PPoPP '13.

[8]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[9]  Nan Wu,et al.  A Fast and Fair Shared Buffer for High-Radix Router , 2014, J. Circuits Syst. Comput..

[10]  David A. Patterson,et al.  Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[11]  Björn Krüger,et al.  The holistic integration of virtual screening in drug discovery. , 2013, Drug discovery today.

[12]  Xiaohua Zhang,et al.  Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines , 2013, J. Comput. Chem..

[13]  Xiaomin Luo,et al.  PDTD: a web-accessible protein database for drug target identification , 2008, BMC Bioinformatics.

[14]  Fabio Checconi,et al.  Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[16]  Tao Tang,et al.  MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems , 2012, Science China Information Sciences.

[17]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.