Detailed and clock-driven simulation for HPC interconnection network

Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

[1]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[2]  Jaideep Vaidya,et al.  Algorithms and Architectures for Parallel Processing , 2018, Lecture Notes in Computer Science.

[3]  Jason R. W. Merrick,et al.  System Simulation: Modeling and Analysis , 2002 .

[4]  Young-Hyun Kim,et al.  A simulation study of the PLC-MAC performance using network simulator-2 , 2008, 2008 IEEE International Symposium on Power Line Communications and Its Applications.

[5]  William Gropp,et al.  MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.

[6]  Pedro López,et al.  Dynamic power saving in fat-tree interconnection networks using on/off links , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Jian Li,et al.  A Framework for End-to-End Simulation of High-performance Computing Systems , 2008, Simul..

[8]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[10]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[11]  Mani B. Srivastava,et al.  A survey of techniques for energy efficient on-chip communication , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[12]  Arnaud Legrand,et al.  Accuracy study and improvement of network simulation in the SimGrid framework , 2009, SimuTools.

[13]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[14]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[15]  Fabio Checconi,et al.  Characterization of the Communication Patterns of Scientific Applications on Blue Gene/P , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[16]  Peng Zhang,et al.  Eigenanalysis-based task mapping on parallel computers with cellular networks , 2014, Math. Comput..

[17]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[18]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[19]  Xuejun Yang,et al.  Tianhe-1A Interconnect and Message-Passing Services , 2012, IEEE Micro.

[20]  Li Wang,et al.  HVCRouter: Energy Efficient Network-on-Chip Router with Heterogeneous Virtual Channels , 2015, ICA3PP.

[21]  Quentin F. Stout,et al.  The Use of the MPI Communication Library in the NAS Parallel Benchmarks , 1999 .

[22]  Jian Li,et al.  A framework for end-to-end simulation of high-performance computing systems , 2008, SimuTools.

[23]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[24]  Csaba Andras Moritz,et al.  LoGPC: modeling network contention in message-passing programs , 1998, SIGMETRICS '98/PERFORMANCE '98.

[25]  Stephen L. Olivier,et al.  Exploiting Geometric Partitioning in Task Mapping for Parallel Computers , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[26]  Jin Zhang,et al.  LogGPO: An accurate communication model for performance prediction of MPI programs , 2009, Science in China Series F: Information Sciences.

[27]  A. Varga,et al.  THE OMNET++ DISCRETE EVENT SIMULATION SYSTEM , 2003 .

[28]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[29]  Jian Li,et al.  Power shifting in Thrifty Interconnection Network , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[30]  Alan Wagner,et al.  MPI-NeTSim: A Network Simulation Module for MPI , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[31]  Hideharu Amano,et al.  Run-time power gating of on-chip routers using look-ahead routing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[32]  Hiroshi Tezuka,et al.  The design and implementation of zero copy MPI using commodity hardware with a high performance network , 1998, ICS '98.

[33]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[34]  Bernd Mohr,et al.  Automatic Trace-Based Performance Analysis of Metacomputing Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[35]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[36]  Xiangke Liao MilkyWay-2: back to the world Top 1 , 2014, Frontiers of Computer Science.

[37]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[38]  Juan Chen,et al.  Supremum of IdleRouters on 2d-mesh with Dimension-order Routing , 2014 .

[39]  Robert N. Noyce,et al.  A History of Microprocessor Development at Intel , 1981, IEEE Micro.

[40]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[41]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[42]  Joseph P. Kenny,et al.  Using Discrete Event Simulation for Programming Model Exploration at Extreme-Scale: Macroscale Components for the Structural Simulation Toolkit (SST) , 2015 .

[43]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[44]  Wenguang Chen,et al.  PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node , 2010, PPoPP '10.

[45]  Natalie D. Enright Jerger,et al.  DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[46]  Yi Zheng,et al.  The TH Express high performance interconnect networks , 2014, Frontiers of Computer Science.

[47]  Giovanni De Micheli,et al.  Reliability and power management of integrated systems , 2004, Euromicro Symposium on Digital System Design, 2004. DSD 2004..

[48]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[49]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[50]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[51]  Bernd Mohr,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Programs , 2003, Euro-Par.