Efficiently Evaluating Speedup Using Sampled Processor Simulation

Cycle accurate simulation of processors is extremely time consuming. Sampling can greatly reduce simulation time while retaining good accuracy. Previous research on sampled simulation has been focusing on the accuracy of CPI. However, most simulations are used to evaluate the benefit of some microarchitectural enhancement, in which the speedup is a more important metric than CPI. We employ the ratio estimator from statistical sampling theory to design efficient sampling to measure speedup and to quantify its error. We show that to achieve a given relative error limit for speedup, it is not necessary to estimate CPI to the same accuracy. In our experiment, estimating speedup requires about 9X fewer instructions to be simulated in detail in comparison to estimating CPI for the same relative error limit. Therefore using the ratio estimator to evaluate speedup is much more cost-effective and offers great potential for reducing simulation time. We also discuss the reason for this interesting and important result.

[1]  Kevin Skadron,et al.  Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[2]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[3]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[4]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[5]  Wei Liu,et al.  EXPERT: expedited simulation exploiting program behavior repetition , 2004, ICS '04.

[6]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7]  Kevin Skadron,et al.  Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation , 2002 .

[8]  Thomas F. Wenisch,et al.  An Evaluation of Stratified Sampling of Microarchitecture Simulations , 2004 .

[9]  A. Winsor Sampling techniques. , 2000, Nursing times.

[10]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[11]  Louise Trevillyan,et al.  Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[12]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[13]  R. Todi SPEClite: using representative samples to reduce SPEC CPU2000 workload , 2001 .

[14]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.