Automatic testcase synthesis and performance model validation for high performance PowerPC processors

The latest high-performance IBM PowerPC microprocessor, the POWERS chip, poses challenges for performance model validation. The current state-of-the-art is to use simple hand-coded bandwidth and latency testcases, but these are not comprehensive for processors as complex as the POWER5 chip. Applications and benchmark suites such as SPEC CPU are difficult to set up or take too long to execute on functional models or even on detailed performance models. We present an automatic testcase synthesis methodology to address these concerns. By basing testcase synthesis on the workload characteristics of an application, source code is created that largely represents the performance of the application, but which executes in a fraction of the runtime. We synthesize representative PowerPC versions of the SPEC2000, STREAM, TPC-C and Java benchmarks, compile and execute them, and obtain an average IPC within 2.4% of the average IPC of the original benchmarks and with many similar average workload characteristics. The synthetic testcases often execute two orders of magnitude faster than the original applications, typically in less than 300K instructions, making performance model validation for today's complex processors feasible.

[1]  Lizy Kurian John,et al.  Improved automatic testcase synthesis for performance model validation , 2005, ICS '05.

[2]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3]  John Paul Shen,et al.  Calibration of Microprocessor Performance Models , 1998, Computer.

[4]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[5]  Dominique Thiébaut,et al.  On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio , 1989, IEEE Trans. Computers.

[6]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[7]  Jacob A. Abraham,et al.  Architectural performance verification: PowerPC processors , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[8]  Steven R. Kunkel,et al.  A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..

[9]  Pradip Bose,et al.  Architectural timing verification and test for super scalar processors , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[10]  R. Bell Experiments in Automatic Benchmark Synthesis , 2004 .

[11]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Stephen W. Keckler,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA.

[13]  Lieven Eeckhout,et al.  Deconstructing and Improving Statistical Simulation in HLS , 2004, ISCA 2004.

[14]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers for Modern Processors , 2006, IEEE Micro.

[15]  Wen-mei W. Hwu,et al.  Benchmark characterization for experimental system evaluation , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.

[16]  Yasunori Kimura,et al.  Reverse Tracer: a software tool for generating realistic performance test programs , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[17]  Massoud Pedram,et al.  Microprocessor power estimation using profile-driven program synthesis , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[19]  Pradip Bose,et al.  Performance Analysis and Its Impact on Design , 1998, Computer.

[20]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[21]  Mayan Moudgill,et al.  Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.

[22]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[23]  E. S. Sorenson,et al.  Evaluating synthetic trace models using locality surfaces , 2002, 2002 IEEE International Workshop on Workload Characterization.

[24]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[25]  John M. Ludden,et al.  Functional verification of the POWER5 microprocessor and POWER5 multiprocessor systems , 2005, IBM J. Res. Dev..

[26]  André Seznec,et al.  Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .

[27]  Ronak Singhal,et al.  Performance Analysis and Validation of the Intel Pentium 4 Processor on 90nm Technology , 2004 .

[28]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[29]  R.H. Bell,et al.  Efficient power analysis using synthetic testcases , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[30]  Lieven Eeckhout,et al.  Accurate Statistical Workload Modeling. , 2002 .

[31]  Mikko H. Lipasti,et al.  A performance methodology for commercial servers , 2000, IBM J. Res. Dev..

[32]  Jason Baumgartner,et al.  Functional verification of the POWER4 microprocessor and POWER4 multiprocessor system , 2002, IBM J. Res. Dev..

[33]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[34]  Pradip Bose,et al.  Stretching the limits of clock-gating efficiency in server-class processors , 2005, 11th International Symposium on High-Performance Computer Architecture.