Detailed cycle-by-cycle simulation has enabled a heightened level of empiricalness to the science of computer architecture. Recent work has been disturbing, however, in examining the faithfulness of the simulation approach. In [1] it was shown that the widely used "constant delay" memory interface was a gross simplification of the memory hierarchy. This was followed by [2, 3] where it was shown that a compounding collection of modeling errors lead to inaccuracies. In [4] it was demonstrated that the entire SPECint95 benchmark suite could be eerily summarized into a handful of statistical control parameters. Finally, in this position paper I present some new evidence that the metrics commonly used (cycles, IPC) deserve a closer inspection. After this, I briefly discuss a range of simulation topics. The IPC / cycles / execution-time juggernaut Figures 1,2 and 3, depict percent of total execution time as a function of instructions per cycle (IPC). This data was collected by modifying the SimpleScalar toolkit to track instruction completions every ten cycles. The result is a graph of the amount of time the superscalar processor spends performing well (completing many instructions per cycle) and performing poorly. The often-used "IPC" benchmark is really an average of microscopic execution performance. The analogy one can draw is that, reporting IPC for an application is like saying when you roll a ball down a U-shaped incline it will come to a complete stop at the bottom of the U instantaneously. We all recognize that the ball will roll down one side and up the other and eventually through friction come to rest at the bottom. A microprocessor, when viewed on a microscopic timescale, really fluctuates like this and reporting IPC as a single number masks this behavior. Unfortunately, reporting execution time also hides these microscopic fluctuations. The focus on average IPC is problematic. On the one hand, we strive to design architectures that make applications execute faster. So it seems logical to study application execution time, or some corollary such as average IPC. On the other hand, the application-architecture interaction is extremely complex and simply looking at these gross metrics of performance can make proper interpretation difficult. Currently, sensitivity analysis is key So if average IPC/execution-time is problematic, what kind of metrics should we use? Until some useful way of transforming simulation results on microscopic timescales into meaningful understanding is developed, sensitivity analysis of IPC and its corollaries (domain-specific performance metrics) …
[1]
Frederic T. Chong,et al.
HLS: combining statistical and symbolic simulation to guide microprocessor designs
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[2]
Doug Burger,et al.
Measuring Experimental Error in Microprocessor Simulation
,
2001,
ISCA 2001.
[3]
Mateo Valero,et al.
Errata on "Measuring Experimental Error in Microprocessor Simulation"
,
2002,
CARN.
[4]
James E. Smith,et al.
Modeling program predictability
,
1998,
ISCA.
[5]
Michael E. Wolf,et al.
The cache performance and optimizations of blocked algorithms
,
1991,
ASPLOS IV.
[6]
Trevor N. Mudge,et al.
A performance comparison of contemporary DRAM architectures
,
1999,
ISCA.
[7]
Tse-Yu Yeh,et al.
Understanding branches and designing branch predictors for high-performance microprocessors
,
2001,
Proc. IEEE.
[8]
Margaret Martonosi,et al.
Dynamic thermal management for high-performance microprocessors
,
2001,
Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.