Eccentric and fragile benchmarks

Benchmarks are essential for computer architecture research and performance evaluation. Constructing a good benchmark suite is, however, non-trivial: it must be representative, show different types of behavior and the benchmarks should not be easily tweaked. This paper uses principal components analysis, a statistical data analysis technique, to detect differences in behavior between benchmarks. Two specific types of benchmarks are identified. Eccentric benchmarks have a behavior that differs significantly from the other benchmarks. They are useful to incorporate different types of behavior in a suite. Fragile benchmarks are weak benchmarks: their execution time is determined almost entirely by a single bottleneck. Removing that bottleneck reduces their execution time excessively. This paper argues that fragile benchmarks are not useful and shows how they can be detected by means of workload characterization techniques. These techniques are applied to the SPEC CPU95 and CPU2000 benchmark suites. It is shown that these suites contain both eccentric and fragile benchmarks. The notions of eccentric and fragile benchmarks are important when composing a benchmark suite and to guide the sub-setting of a benchmark suite.

[1]  Margaret Martonosi,et al.  Challenges in Computer Architecture Evaluation , 2003, Computer.

[2]  Craig B. Zilles Benchmark health considered harmful , 2001, CARN.

[3]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[4]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[5]  Koen De Bosschere,et al.  Efficient profile-based evaluation of randomising set index functions for cache memories , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[6]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[7]  G. Dunteman Principal Components Analysis , 1989 .

[8]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[9]  Thomas M. Conte,et al.  Benchmark characterization , 1991, Computer.

[10]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[11]  Ellis Horowitz,et al.  Fundamentals of Data Structures in Pascal , 1984 .

[12]  Mark J. Charney,et al.  Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[13]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[14]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[15]  Koen De Bosschere,et al.  Evaluation of the performance of polynomial set index functions , 2002, ISCA 2002.

[16]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[17]  Brian A. Wichmann,et al.  A Synthetic Benchmark , 1976, Comput. J..

[18]  Lieven Eeckhout,et al.  Designing Computer Architecture Research Workloads , 2003, Computer.

[19]  Antonio González,et al.  Randomized Cache Placement for Eliminating Conflicts , 1999, IEEE Trans. Computers.

[20]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[21]  Koen De Bosschere,et al.  Highly accurate and efficient evaluation of randomising set index functions , 2003, J. Syst. Archit..