Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis

The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, hand-coded microbenchmarks can be used to accelerate performance evaluation, these programs lack the complexity to stress increasingly complex architecture designs. Larger and more complex real-world workloads should be employed to measure the performance of a given design or to evaluate the efficiency of various design alternatives. These applications can take days or weeks if run to completion on a detailed architecture simulator. In the past, researchers have applied machine learning and statistical sampling methods to reduce the average number of instructions required for detailed simulation. Others have proposed statistical simulation and workload synthesis techniques, which can produce programs that emulate the execution characteristics of the application from which they are derived but have a much shorter execution period than the original. However, these existing methods are difficult to apply to multi-threaded programs and can result in simplifications that miss the complex interactions between multiple, concurrently running threads. This study focuses on developing new techniques for accurate and effective multi-threaded workload synthesis, which can significantly accelerate architecture design evaluation of multi-core processors. We propose to construct synchronized statistical flow graphs that incorporate inter-thread synchronization and sharing behavior to capture the complex characteristics and interactions of multiple threads. Moreover, we develop thread-aware data reference models and wavelet-based branching models to generate accurate memory access and dynamic branch statistics. Experimental results show that a framework integrated with the aforementioned models can automatically generate synthetic programs that maintain characteristics of original workloads but have significantly reduced runtime.

[1]  Lieven Eeckhout,et al.  Accurate memory data flow modeling in statistical simulation , 2006, ICS '06.

[2]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[3]  Min Xu,et al.  Evaluating Non-deterministic Multi-threaded Commercial Workloads , 2001 .

[4]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[5]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[6]  Massoud Pedram,et al.  Microprocessor power estimation using profile-driven program synthesis , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Lieven Eeckhout,et al.  Deconstructing and Improving Statistical Simulation in HLS , 2004, ISCA 2004.

[8]  Lieven Eeckhout,et al.  Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks , 2006, 2006 IEEE International Symposium on Workload Characterization.

[9]  Lieven Eeckhout,et al.  Hybrid analytical-statistical modeling for efficiently exploring architecture and workload design spaces , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[11]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[12]  GuptaAnoop,et al.  The SPLASH-2 programs , 1995 .

[13]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[14]  Lieven Eeckhout,et al.  Evaluating the efficacy of statistical simulation for design space exploration , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Lizy Kurian John,et al.  Improved automatic testcase synthesis for performance model validation , 2005, ICS '05.

[17]  Lieven Eeckhout,et al.  Considering all starting points for simultaneous multithreading simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[18]  Lieven Eeckhout,et al.  Statistical simulation of chip multiprocessors running multi-program workloads , 2007, 2007 25th International Conference on Computer Design.

[19]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[20]  James E. Smith,et al.  Statistical simulation of symmetric multiprocessor systems , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.

[21]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[22]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[23]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[24]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).