NStrace: A bus-driven instruction trace tool for PowerPC microprocessors

NStrace is a bus-driven hardware trace facility developed for the PowerPC® family of superscalar RISC microprocessors. It uses a recording of activity on a target processor's bus to infer the sequence of instructions executed during that recording period. NStrace is distinguished from related approaches by its use of an architecture-level simulator to generate the instruction sequence from the bus recording. The generated trace represents the behavior of the processor as it executes at normal speed while interacting normally with its run-time environment. Furthermore, details of the processor state that are not generally available to other trace mechanisms can be provided by the architectural simulation. There are two main components to the process of generating bus-driven instruction traces: bus capture and trace generation. Bus capture is triggered by a call to a system program that puts a particular address on the bus, then establishes the initial state of the processor by a combination of writing out register values and invalidating caches. A logic analyzer records the bus activity, and from this a file of bus transactions is produced. Trace generation proceeds by driving a processor simulator with these bus transactions and recording the sequence of instructions that results. The processor simulator is an 3 elaboration of that developed for the PowerPC Visual Simulator. We have successfully generated instruction traces for a mix of utility programs and real applications on several microprocessor platforms running several operating systems. The capacity of the bus recording hardware is two million transactions, yielding instruction traces with lengths of the order of one hundred million instructions. This trace facility has been used for a number of studies covering a range of performance issues involving software, hardware, and their interactions.

[1]  Trevor N. Mudge,et al.  IDtrace/spl minus/a tracing tool for i486 simulation , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[2]  Josep Torrellas,et al.  Characterizing the caching and synchronization performance of a multiprocessor operating system , 1992, ASPLOS V.

[3]  Doug Kimelman,et al.  Strata-various: multi-layer visualization of dynamics in software system behavior , 1994, Proceedings Visualization '94.

[4]  Michael Alexander,et al.  Designing the PowerPC 60X bus , 1994, IEEE Micro.

[5]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[6]  Brian N. Bershad,et al.  A trace-driven comparison of algorithms for parallel prefetching and caching , 1996, OSDI '96.

[7]  Randall R. Heisch Trace-directed program restructuring for AIX executables , 1994, IBM J. Res. Dev..

[8]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[9]  Pradeep K. Dubey,et al.  Dynamic Trace Analysis for Analytic Modeling of Suberscalar Performance , 1994, Perform. Evaluation.

[10]  Susan J. Eggers,et al.  Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[11]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[12]  David W. Wall,et al.  Systems for Late Code Modification , 1991, Code Generation.

[13]  M. VandenBrink Performance implications of the PowerPC architecture's hashed page table utilization in Windows NT , 1997, 1997 IEEE International Performance, Computing and Communications Conference.

[14]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[15]  Thomas L. Adams A measurement study of memory transaction characteristics on a PowerPC-based Macintosh , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[16]  David W. Wall,et al.  Software Methods for System Address Tracing: Implementation and Validation , 1999 .

[17]  Ali Poursepanj,et al.  The PowerPC performance modeling methodology , 1994, CACM.

[18]  Chih-Po Wen Improving instruction supply efficiency in superscalar architectures using instruction trace buffers , 1992, SAC '92.

[19]  Scott McMahon,et al.  The capture, characterization, and performance analysis of Macintosh traces , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[20]  R.E. Johnson,et al.  Evaluation of Multithreaded Uniprocessors for Commercial Application Environments , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).