Enabling Parallel Simulation of Large-Scale HPC Network Systems

With the increasing complexity of today's high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems-in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today's IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today's high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations.

[1]  D. Roweth,et al.  Cray XC ® Series Network , 2012 .

[2]  Torsten Hoefler,et al.  Slim Fly: A Cost Effective Low-Diameter Network Topology , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Christopher D. Carothers,et al.  ROSS: a high-performance, low memory, modular time warp system , 2000, PADS '00.

[4]  William J. Dally,et al.  Cost-Efficient Dragonfly Topology for Large-Scale Systems , 2009, IEEE Micro.

[5]  Henri Casanova,et al.  Single Node On-Line Simulation of MPI Applications with SMPI , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[6]  Christian Engelmann,et al.  Supporting the Development of Resilient Message Passing Applications Using Simulation , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[7]  Franck Cappello,et al.  On Communication Determinism in Parallel HPC Applications , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[8]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[9]  Robert B. Ross,et al.  Using massively parallel simulation for mpi collective communication modeling in extreme-scale networks , 2014, Proceedings of the Winter Simulation Conference 2014.

[10]  Jeffrey S. Vetter,et al.  Aspen: A domain specific language for performance modeling , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[12]  Robert Latham,et al.  Techniques for modeling large-scale HPC I/O workloads , 2015, PMBS '15.

[13]  Cyriel Minkenberg,et al.  Trace-driven co-simulation of high-performance computing systems using OMNeT++ , 2009, SimuTools.

[14]  Misbah Mubarak,et al.  Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations , 2015, Euro-Par Workshops.

[15]  Robert B. Ross,et al.  CODES: Enabling Co-Design of Multi-Layer Exascale Storage Architectures , 2011 .

[16]  Christopher D. Carothers,et al.  Warp speed: executing time warp on 1,966,080 cores , 2013, SIGSIM-PADS.

[17]  Mateo Valero,et al.  On-the-Fly Adaptive Routing in High-Radix Hierarchical Networks , 2012, 2012 41st International Conference on Parallel Processing.

[18]  Sadaf R. Alam,et al.  Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.

[19]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[20]  John Kim,et al.  Overcoming far-end congestion in large-scale networks , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[21]  Robert B. Ross,et al.  A case study in using massively parallel simulation for extreme-scale torus network codesign , 2014, SIGSIM PADS '14.

[22]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[23]  Amith R. Mamidala,et al.  Looking under the hood of the IBM Blue Gene/Q network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Torsten Hoefler,et al.  Cost-effective diameter-two topologies: analysis and evaluation , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Christian Engelmann,et al.  xSim: The extreme-scale simulator , 2011, 2011 International Conference on High Performance Computing & Simulation.

[26]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[27]  Laura Carrington,et al.  A performance prediction framework for scientific applications , 2003, Future Gener. Comput. Syst..

[28]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[29]  Laxmikant V. Kalé,et al.  Avoiding hot-spots on two-level direct networks , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[31]  Ibm Redbooks,et al.  IBM System Blue Gene Solution: Blue Gene/P Application Development , 2009 .

[32]  Sadaf R. Alam,et al.  Cray XT4: an early evaluation for petascale scientific simulation , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[33]  Courtenay T. Vaughan,et al.  Investigating the Impact of the Cielo Cray XE6 Architecture on Scientific Application Codes , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[34]  B RossRobert,et al.  Enabling Parallel Simulation of Large-Scale HPC Network Systems , 2017 .

[35]  Christopher D. Carothers,et al.  ROSS: a high-performance, low memory, modular time warp system , 2000, Proceedings Fourteenth Workshop on Parallel and Distributed Simulation.

[36]  Robert B. Ross,et al.  Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[37]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[38]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[39]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[40]  Robert Birke,et al.  Towards massively parallel simulations of massively parallel high-performance computing systems , 2012, SimuTools.