SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. We present the sampling microarchitecture simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates. Analysis of 41 of the 45 possible SPEC2K benchmark/ input combinations show CPI and energy per instruction (EPI) can be estimated to within 3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in micro-architectural state initialization introduces an additional uncertainty which we empirically bound to /spl sim/2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.

[1]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[2]  Stanley Lemeshow,et al.  Sampling of Populations: Methods and Applications , 1991 .

[3]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[4]  Sandhya Dwarkadas,et al.  Execution-driven simulation of multiprocessors: address and timing analysis , 1994, TOMC.

[5]  Gary Lauterbach Accelerating architectural simulation by parallel execution of trace samples , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[6]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[7]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[8]  Sarita V. Adve,et al.  Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[9]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  André Seznec,et al.  Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .

[11]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Kevin Skadron,et al.  Minimal subset evaluation: rapid warm-up for simulated hardware state , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[13]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .

[14]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[15]  Mikko H. Lipasti,et al.  Precise and Accurate Processor Simulation , 2002 .

[16]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[17]  Wei-Chung Hsu,et al.  On the predictability of program behavior using different input data sets , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[18]  Thomas F. Wenisch,et al.  Applying SMARTS to SPEC CPU20001 , 2003 .