Cycle-Accurate Microarchitecture Performance Evaluation

We present the design, implementation, and evaluation of a circuit we call the Statistics Module that captures cycle-accurate performance data at (or above) the microarchitecture layer. The circuit is deployed introspectively—in the architecture itself— using an FPGA in the context of a soft-core implementation of a SPARC architecture (LEON). Accessible over the Internet, the circuit can be dynamically configured (without resynthesis) to capture programlevel, function-level, and instruction-level statistics on any subset of predefined VHDL signals. The circuit is deployed outside the actual soft core, so that its operation does not interfere with a program’s execution at any level. In contrast with simulations, StatsMod monitors actual real-time program executions, including runtime artifacts such as multithreading, operating system support, and external interrupts. Furthermore, unlike software-introduced instrumentation, the measurements do not affect the statistics, and microarchitecture characteristics are easily captured. Our design avoids the otherwise combinatorial size of circuitry that would be required to accommodate all methods and events, scaling well with the number of artifacts that are actually measured. We have used this circuit to measure cycle-accurate cache-RAM statistics, such as cache hits and misses, RAM reads and writes, using both write-through and write-back policies. In this paper, we show the scalabilty of our design as it accommodates more methods and events. ∗This work was sponsored by the National Science Foundation under grant ITR–0313203. †Contact: cytron@acm.org