Transparent, low-overhead profiling on modern processors

Over the past two years, the Digital Continuous Profiling Inf rastructure (DCPI) research project at Compaq’s Systems Research Center and Western Research Lab has been exploring new ways of profiling computer systems. We have developed the DCPI tools, a suite of software profiling tools that provi de transparent, low-overhead (0.5%-3.0% slowdown) profili ng of complete systems [1]. The DCPI tools run on Alpha microproce ssors under Digital UNIX and Microsoft Windows/NT and are freely available for downloading from http://www.research.digital.com/SRC/dcpi/. The DCPI tools provide profile information at varying levels of granularity, from whole images, down to individual proce dures and basic blocks on down to detailed information about individual instructions, including information about dyn amic behavior such as cache misses, branch mispredicts and other f rms of dynamic stalls. Instruction-level stall informat ion is attributed to the instructions that actually incur such sta ll , in contrast with some systems that attribute the inform ation to a nearby instruction. This precise attribution is extremely useful when tuning code. On in-order processors such as the Alpha 21064 and 21164, the tools rely on periodic cycle counter interrupts and static analysis of an executable image to provide instruction-lev el information. On out-of-order processors, this approach is not feasible and we have designed a new form of hardware support f or instruction-level information calledProfileMe, which can provide significant insight into the behavior of program s running on complex microprocessors (especially out-of-o rder processors) [2].ProfileMerequires only modest hardware modifications and can be used b y our DCPI tools in a way that collects detailed profile information without substantial profiling overhead. The DCPI profiling tools have several characteristics that d istinguish them from other profiling tools:

[1]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.