Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison

While cycle-level, full-system architecture simulation tools are capable of estimating performance at arbitrary accuracy, the time to simulate an entire application is usually prohibitive. Moreover, simulating multi-threaded applications further exacerbates this problem as most simulation tools are single-threaded. Recently, statistical sampling techniques, such as SMARTS, have managed to bring down the simulation time significantly by making it possible to only simulate about 1% of the code with sufficient accuracy. However, thousands of simulation points throughout the benchmark must still be simulated. First of all, we propose to use the well-established statistical method matched-pair comparison and motivate why this will bring down the number of simulation points needed to achieve a given accuracy. We apply it to single-processor as well as multiprocessor simulation and show that it is capable of reducing the number of needed simulation points by one order of magnitude. Secondly, since we apply the technique to single- as well as multiprocessors, we study for the first time the efficiency of statistical sampling techniques in multiprocessor systems to establish a baseline to compare with. We show theoretically and confirm experimentally, that while the instruction throughput vary significantly on each individual processor, the variability of instruction throughput across processors in a multiprocessor system decreases as we increase the number of processors for some important workloads. Thus, a factor of P fewer simulation points, where P is the number of processors, are needed to begin with when sampling is applied to multiprocessors

[1]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[2]  Thomas F. Wenisch,et al.  An Evaluation of Stratified Sampling of Microarchitecture Simulations , 2004 .

[3]  A. Winsor Sampling techniques. , 2000, Nursing times.

[4]  Håkan Grahn,et al.  SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.

[5]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[6]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[7]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[8]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[9]  Shelley Chen Direct SMARTS : Accelerating Microarchitectural Simulation Through Direct Execution , 2004 .

[10]  G. Meek Mathematical statistics with applications , 1973 .

[11]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[12]  Jianwei Chen,et al.  SimWattch: An Approach to Integrate Complete-System with User-Level Performance/Power Simulators , 2003 .

[13]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[14]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[17]  Lizy Kurian John,et al.  Efficiently Evaluating Speedup Using Sampled Processor Simulation , 2004, IEEE Computer Architecture Letters.

[18]  Sarita V. Adve,et al.  RSIM Reference Manual: Version 1.0 , 1997 .

[19]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..