Representative Multiprogram Workloads for Multithreaded Processor Simulation

Almost all new consumer-grade processors are capable of executing multiple programs simultaneously. The analysis of multiprogrammed workloads for multicore and SMT processors is challenging and time-consuming because there are many possible combinations of benchmarks to execute and each combination may exhibit several different interesting behaviors. Missing particular combinations of program behaviors could hide performance problems with designs. It is thus of utmost importance to have a representative multiprogrammed workload when evaluating multithreaded processor designs. This paper presents a methodology that uses phase analysis, principal components analysis (PCA) and cluster analysis (CA) applied to microarchitecture-independent program characteristics in order to find important program interactions in multiprogrammed workloads. The end result is a small set of co-phases with associated weights that are representative for a multiprogrammed workload across multithreaded processor architectures. Applying our methodology to the SPEC CPU 2000 benchmark suite yields 50 distinct combinations for two-context multithreaded processor simulation that6 researchers and architects can use for simulation. Each combination is simulated for 50 million instructions, giving a total of 2.5 billion instructions to be simulated for the SPEC CPU2000 benchmark suite. The performance prediction error with these representative combinations is under 2.5% of the real workload for absolute throughput prediction and can be used to make relative throughput comparisons across processor architectures.

[1]  Per Stenström,et al.  Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[2]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[3]  Lieven Eeckhout,et al.  Efficient Sampling Startup for Sampled Processor Simulation , 2005, HiPEAC.

[4]  Lieven Eeckhout,et al.  Workload design: selecting representative program-input pairs , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[5]  Brad Calder,et al.  The Strong correlation Between Code Signatures and Performance , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[6]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[7]  Trevor N. Mudge,et al.  Analysis of branch prediction via data compression , 1996, ASPLOS VII.

[8]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO.

[9]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[10]  Steven K. Reinhardt,et al.  The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[11]  L. Eeckhout,et al.  Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[12]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[13]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.

[14]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[15]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[16]  Lieven Eeckhout,et al.  Considering all starting points for simultaneous multithreading simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[17]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[18]  Nathan L. Binkert,et al.  Network-Oriented Full-System Simulation using M5 , 2003 .

[19]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[20]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.