Abstract I - Introduction. The Intel chip based personal computer world is awash with benchmarks. Open any personal computer magazine and you will find personalcomputers rated with WinMarks, iComps, Bytemarks or some other benchmark. On the other hand, the RISC (Reduced Instruction SetComputer) and Unix operating system workstation are usually benchmarked with the SPECMarks from the Standard Performance Evaluation Corporation (SPEC). While there are other benchmarks for RISC machines, most people agree the SPECint and SPECfp provide a standard for comparing the integer and floating point capabilities of the different types of RISC computers. Two of the most important characteristics of a networked RISC workstation are the CPU's integer and floating point performance; the SPECMarks measure these two features verywell.The SPECMarks also run on Intel based personal computers and the statistics are available [SPEC-b], but they are not normally used to evaluate personal computers. There are several reasons SPECMarks are not frequently used to judge personal computers. To begin with, the SPECMarks are expensive, take the better part of a day to run on any one machine, and require a minimum of 64 Mbytes of main memory [Yager]. In addition to not being practical for the average person to run, the SPECMarks only test CPU and memory performance. They do not test many of the components personal computer users want measured, such as disk I/O and video speed.The SPECMarks have other problems beyond being impractical for many machines. The SPECMarks commonly available from the manufacturers and other sources usually do not list important factors that are needed to make valid comparisons. When a benchmark is run there are many characteristics that should be reported, such as the operating system type and version, the size of main memory and cache, and compiler and compiler settings used to create the executable code [Ahmad, Fleming, Muchsel, and Sill]. These and many other factors are usually missing from the manufacturer's technical reports - often only the SPECMarks themselves are available.Because of these problems with the SPECMarks, we developed a small set of benchmarks to compare different CPU types, memory systems,and operating systems. We wanted the benchmarks to be able to scale well across different CPU types and operating systems. For example, if a benchmark test runs at a certain speed on one computer, a second computer that is identical to the first except for the CPU should run the test faster or slower by a factor depending only on the difference in the CPU. Similarly, if the benchmark is testing memory access or capacity, the size and speed of the memory can change the results. On the other hand, the tests must provide a fair assessment of a computer's overall capabilities and allow comparisons of different computers and operating systems. If there are anomalies in the data the benchmarks produce, these anomalies should be explainable by the different CPU, memory architectures, or operating systems. Once the tests are run, a fair comparison can be made between the different CPU types and different operating systems.
[1]
Philip J. Fleming,et al.
How not to lie with statistics: the correct way to summarize benchmark results
,
1986,
CACM.
[2]
Michael D. Smith,et al.
The measured performance of personal computer operating systems
,
1995,
SOSP.
[3]
Reinhold Weicker,et al.
Dhrystone: a synthetic systems programming benchmark
,
1984,
CACM.
[4]
David A. Patterson,et al.
The case for the reduced instruction set computer
,
1980,
CARN.
[5]
Ron Fox.
Why MIPS are meaningless
,
1988
.
[6]
Bill Nicholls.
That “B” word
,
1988
.
[7]
Walter J. Price.
A benchmark tutorial
,
1989,
IEEE Micro.
[8]
M. Ahmad.
The Computing Speed of a New Machine
,
1973,
Comput. J..
[9]
Reinhold Weicker,et al.
An overview of common benchmarks
,
1990,
Computer.
[10]
A. Korn,et al.
How fast is fast
,
1994
.