Reverse State Reconstruction for Sampled Microarchitectural Simulation

For simulation, a tradeoff exists between speed and accuracy. The more instructions simulated from the workload, the more accurate the results - but at a higher cost. To reduce processor simulation times, a variety of techniques have been introduced. Statistically sampled simulation is one method that mitigates the cost of simulation while retaining high accuracy. A contiguous group of instructions, called a cluster, is simulated and then a fast type of simulation is used to skip to the next group. As instructions are skipped, non-sampling bias is introduced and must be removed for accurate measurements to be taken. In this paper, the reverse state reconstruction warm-up method is introduced. While skipping between clusters, the data necessary for reconstruction are recorded. Later, these data are scanned in reverse order so that processor state can be approximated without functionally applying every skipped instruction. By trading storage for speed, the proposed method introduces the concept of on-demand state reconstruction for sampled simulations. Using this technique, the method isolates ineffectual instructions from the skipped instructions without the use of profiling. Compared to SMARTS, reverse state reconstruction achieves a maximum and average speedup ratio of 2.45 and 1.64, respectively, with minimal sacrifice to accuracy (less than 0.3%)

[1]  E.S. Davidson,et al.  Architectural vs. delivered performance of the IBM RS/6000 and the Astronautics ZS-1 , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[2]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[3]  Thomas M. Conte,et al.  Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.

[4]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[5]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[6]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7]  Janak H. Patel,et al.  Trace driven simulation using sampled traces , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[8]  Thomas F. Wenisch,et al.  Simulation sampling with live-points , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[10]  Kevin Skadron,et al.  Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation , 2002 .

[11]  Gary Lauterbach Accelerating architectural simulation by parallel execution of trace samples , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[12]  Chester Hayden McCall,et al.  Sampling and Statistics Handbook for Research , 1982 .

[13]  Lieven Eeckhout,et al.  BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation , 2005, Comput. J..

[14]  Jih-Kwon Peir,et al.  Cache sampling by sets , 1993, IEEE Trans. Very Large Scale Integr. Syst..

[15]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[16]  Thomas Martin Conte,et al.  Systematic Computer Architecture Prototyping , 1992 .

[17]  Ali Poursepanj,et al.  The PowerPC performance modeling methodology , 1994, CACM.

[18]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.