Improving lookahead in parallel discrete event simulations of large-scale applications using compiler analysis

This paper addresses the issue of efficient and accurate performance prediction of large-scale message-passing applications on high performance architectures using simulation. Such simulators are often based on parallel discrete event simulation, typically using the conservative protocol to synchronize the simulation threads. The paper considers how a compiler can be used to automatically extract information about the lookahead present in the application and how this can be used to improve the performance of the null protocol used for synchronization. These techniques are implemented in the MPI-Sim simulator and dHPF compiler which had previous been extended to work together for optimizing the simulation of local computational components of an application. The results show that the availability of lookahead ranging improves the runtime of the simulator by factors ranging front 9% up to two orders of magnitude, with 30-60% improvements being typical for the real-world codes. The experiments also show that these improvements are directly correlated with reductions by the number of null messages required by the simulations.

[1]  Thomas Phan,et al.  Performance prediction of large parallel applications using parallel simulations , 1999, PPoPP '99.

[2]  R. C. Covington,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[3]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.

[4]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[6]  Rizos Sakellariou,et al.  Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction , 2000, LCPC.

[7]  David M. Nicol,et al.  Parallelized Direct Execution Simulation of Message-Passing Parallel Programs , 1996, IEEE Trans. Parallel Distributed Syst..

[8]  Rizos Sakellariou,et al.  Application representations for a multi-paradigm performance modeling environment for parallel syste , 2000 .

[9]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[10]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[11]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[12]  Rajive L. Bagrodia,et al.  An adaptive synchronization method for unpredictable communication patterns in dataparallel programs , 1995, Proceedings of 9th International Parallel Processing Symposium.

[13]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[14]  Mineo Takai,et al.  Parssec: A Parallel Simulation Environment for Complex Systems , 1998, Computer.

[15]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[16]  Rajive L. Bagrodia,et al.  MPI-SIM: using parallel simulation to evaluate MPI programs , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[17]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[18]  Rizos Sakellariou,et al.  Application Representations for Multiparadigm Performance Modeling of Large-Scale Parallel Scientific Codes , 2000, Int. J. High Perform. Comput. Appl..

[19]  Vikram S. Adve,et al.  Compiler-supported simulation of highly scalable parallel applications , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[20]  Vikram S. Adve,et al.  Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.

[21]  William E. Weihl,et al.  Reducing synchronization overhead in parallel simulation , 1996, Workshop on Parallel and Distributed Simulation.

[22]  W. Weihl,et al.  Reducing Synchronization Overhead in Parallel Simulation , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[23]  Ewa Deelman,et al.  Asynchronous Parallel Simulation of Parallel Programs , 2000, IEEE Trans. Software Eng..

[24]  Mark D. Hill,et al.  Optimistic simulation of parallel architectures using program executables , 1996, Workshop on Parallel and Distributed Simulation.

[25]  RepsThomas,et al.  Interprocedural slicing using dependence graphs , 1990 .

[26]  J. Robert Jump,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[27]  K. ReinhardtSteven,et al.  The Wisconsin Wind Tunnel , 1993 .

[28]  David M. Nicol,et al.  A distributed memory LAPSE: parallel simulation of message-passing programs , 1994, PADS '94.

[29]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..