Accurate statistical approaches for generating representative workload compositions

Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which principal components analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose independent components analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA.

[1]  Thomas F. Wenisch,et al.  Applying SMARTS to SPEC CPU20001 , 2003 .

[2]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[3]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[4]  H. Vandierendonck,et al.  Experiments with subsetting benchmark suites , 2004, IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004.

[5]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[7]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Daniel Citron MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences , 2003, ISCA '03.

[9]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[10]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[11]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[13]  Jozo J. Dujmovic,et al.  Evolution and evaluation of SPEC benchmarks , 1998, PERV.

[14]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[15]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[16]  Douglas M. Hawkins,et al.  A statistically rigorous approach for improving simulation methodology , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[17]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[19]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[20]  Lieven Eeckhout,et al.  Designing Computer Architecture Research Workloads , 2003, Computer.

[21]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[22]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[23]  J. Francis Statistica for Windows , 1995 .