Runtime variability in scientific parallel applications

Simulation remains an important component in the design of multicore processor architectures, just as in uniprocessor design. In contrast to single-threaded applications, however, many multi-threaded programs are not deterministic: in multiple runs, even on the same architecture, different execution paths can be taken. This results in variability of simulation results, which is a well-known phenomenon in real systems, but is nearly universally ignored in simulation experiments. In this paper, we review existing work on simulation variability. We extend this work, which has been focused mainly on commercial workloads, and show that it also applies to scientific workloads. We characterize variability for the SPLASH-2 benchmark applications, and look at how variability can impact the optimization of the interconnection network. Both previous and our own results show that studies aiming to prove the optimality of architectural or other modifications should keep this variability in mind, and make sure that observed improvements are statistically valid in relation to the inherent variability to avoid drawing the wrong conclusions. Although this problem is in no way solved to satisfaction, we review some possible solutions, such as the use of statistics, using different types of metrics, and sample-based simulation.

[1]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[2]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3]  Jan M. Van Campenhout,et al.  Synthetic traffic generation as a tool for dynamic interconnect evaluation , 2007, SLIP '07.

[4]  Xavier Martorell,et al.  Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[5]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[8]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[9]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[10]  Mikko H. Lipasti,et al.  Redeeming IPC as a performance metric for multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[11]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[12]  Hugo Thienpont,et al.  Predicting the performance of reconfigurable optical interconnects in distributed shared-memory systems , 2007, Photonic Network Communications.