MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

A majority of the papers published in leading computer architecture conferences use SPEC CPU2000, or its predecessor SPEC CPU95, which has become the de facto standard for measuring processor and/or memory-hierarchy performance. However, in most cases a subset of the suite's benchmarks are simulated. For example: 27 papers were published in ISCA 2002, 16 used SPEC CINT2000, 4 used the whole suite, and only 3 papers explained their omissions.This paper quantifies the extent of this phenomenon in the ISCA, Micro, and HPCA conferences: 173 papers were surveyed, 115 used benchmarks from SPEC CINT, but only 23 used the whole suite. If this current trend continues, by the year 2005 80% of the papers will use the full CINT2000 suite, a year after CPU2004 shall be announced.We claim that results based upon a subset of a benchmark suite are speculative and conflict with Amdahl's Law. The law implies that we must present the speedup of using the proposed technique on the whole suite. Projecting the law (by statistically supplying values for the missing benchmarks) to several published papers reduces promising results to average ones. Speedups are reduced from 1.42 to 1.16 in one case, from 1.43 to 1.13 in another, and from 1.76 to 1.15 in a third.Finally, we have found that the disregard for CFP2000 is unwarranted in papers that explore the data cache domain, the suite displays a higher data cache miss rate than CINT2000, which is used more frequently.