Cycle-accurate simulation methods are necessarily slow. This slowness is only acceptable if the simulation results can be shown to have smaller error than other, faster, methods of generating the same results. We use the SESC simulator as a representative of cycleaccurate simulation. We configure it to match an actual SGI MIPS R12000 system as closely as possible. We run the SPEC CPU 2000 benchmarks and compare simulated results against actual hardware performance counters. We find that for CPI SESC diverges from actual hardware by 24.6% for integer benchmarks and 67.6% for floating point. We then use the Qemu DBI tool as the basis for a faster simulation environment. Qemu generates traces that are consumed by individual simulation processes that run concurrently on a CMP system. We find that for CPI the results diverge from actual hardware by -1.8% for integer benchmarks and 26.3% for floating point. These Qemu results are obtained an order of magnitude faster than those found with SESC (and other popular simulators popular in academia, many of which are much slower). These results show that a re-evaluation of the tradeoffs of cycle-accurate simulations in the field of computer architecture research could have merit. Furthermore, using multiple methods to validate a tool or to investigate a proposed architecture is simple good science . 1 Background “Cycle-accurate” simulators are one of the prevailing simulation tools in Computer Architecture research. Unfor tunately, the results generated by academic 1 cycle-accurate simulators can be misleading due to unknown amounts of error. More importantly, similar results can be generated 1Heretofore, when we mention cycle-accurate simulations, w e refer to tools and results generated in academia. Industry research ers and developers have created much more accurate simulators, but since their source code is not generally available to academics, we will not dis cus them here. faster using dynamic binary instrumentation (DBI) based simulation techniques. There are a number of problems with cycle-accurate simulators, in general: • Speed: Simulators are slow, often multiple orders of magnitude slower than native execution. Many researchers commonly use “reduced-execution” methods to compensate, yet these methods can compound simulation error if not carefully applied. For instance, Yi et al. [19] find that various reduced execution methods can add large errors—never less than 5%. • Obscurity: The simulation tools are rarely used outside the specialized field of Computer Architecture research. Since the simulators themselves are rarely used for anything except running a limited set of benchmarks, bugs can lurk in the code base for a long time, and many are possibly never noticed at all. • Code Forks: Since few people are using the simulators at any given time, the code base quickly becomes unmaintained and fragmented among the groups using it. Bugs may be fixed at different times and at different institutions. The source codes diverge so much that when one paper claims it uses a particular simulator, that statement may have little meaning, since the code used differs so much from the mainline (so much so as to be unrecognizable). • Generalization: Simulators are often highly configurable, since the authors often want to create a tool that can be used to model a multitude of different situations. The end result is that a single simulator can model all architectures, but it may model them all in an equally poor manner. Another problem is that the more configurable a simulator, the easier it is to configure it improperly, often in non-obvious ways. This has been one of the biggest problems for these authors. • Validation: Most simulators are not validated against real hardware, and when they are, the results are rarely within 10% error, even after extensive effort has been taken to attempt to model a known architecture as
[1]
Can trace-driven simulators accurately predict superscalar performance?
,
1996,
Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[2]
John Paul Shen,et al.
Calibration of Microprocessor Performance Models
,
1998,
Computer.
[3]
An Illustration of the Benefits of the MIPS ® R 12000 ® Microprocessor and OCTANE TM System Architecture
,
1999
.
[4]
Mark Heinrich,et al.
FLASH vs. (simulated) FLASH: closing the simulation loop
,
2000,
SIGP.
[5]
Doug Burger,et al.
Measuring Experimental Error in Microprocessor Simulation
,
2001,
ISCA 2001.
[6]
Patricia J. Teller,et al.
Just how accurate are performance counters?
,
2001,
Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).
[7]
Mikko H. Lipasti,et al.
Precise and Accurate Processor Simulation
,
2002
.
[8]
Edward S. Davidson,et al.
TAXI: Trace Analysis for x86 Interpretation
,
2002,
Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[9]
Daniel Citron.
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences
,
2003,
ISCA '03.
[10]
D. Citron.
MisSPECulation: partial and misleading use of spec CPU2000 in computer architecture conferences
,
2003,
30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[11]
Rajiv Kapoor,et al.
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
,
2004,
37th International Symposium on Microarchitecture (MICRO-37'04).
[12]
Douglas M. Hawkins,et al.
Characterizing and comparing prevailing simulation techniques
,
2005,
11th International Symposium on High-Performance Computer Architecture.
[13]
Fabrice Bellard,et al.
QEMU, a Fast and Portable Dynamic Translator
,
2005,
USENIX ATC, FREENIX Track.
[14]
S. Eranian.
Perfmon2: a flexible performance monitoring interface for Linux
,
2010
.