HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap

DRAM access traces (i.e., off-chip memory references) can be extremely valuable for the design of memory subsystems and performance tuning of software. Hardware snooping on the off-chip memory interface is an effective and nonintrusive approach to monitoring and collecting real-life DRAM accesses. However, compared with software-based approaches, hardware snooping approaches typically lack semantic information, such as process/function/object identifiers, virtual addresses, and lock contexts, that is essential to the complete understanding of the systems and software under investigation. In this article, we propose a hybrid hardware/software mechanism that is able to collect off-chip memory reference traces with semantic information. We have designed and implemented a prototype system called HMTT (Hybrid Memory Trace Tool), which uses a custom-made DIMM connector to collect off-chip memory references and a high-level event-encoding scheme to correlate semantic information with memory references. In addition to providing complete, undistorted DRAM access traces, the proposed system is also able to perform various types of low-overhead profiling, such as object-relative accesses and multithread lock accesses.

[1]  Kevin P. Lawton Bochs: A Portable PC Emulator for Unix/X , 1996 .

[2]  Li Liu,et al.  HMTT: a platform independent full-system memory trace monitoring system , 2008, SIGMETRICS '08.

[3]  Wenli Zhang,et al.  HaLock: Hardware-assisted lock contention detection in multithreaded applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Ramendra K. Sahoo,et al.  MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design , 2000, SIGP.

[5]  James K. Archibald,et al.  BACH: a hardware monitor for tracing microprocessor-based systems , 1993, Microprocessors and microsystems.

[6]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[7]  Chi-Keung Luk,et al.  PinOS: a programmable framework for whole-system dynamic instrumentation , 2007, VEE '07.

[8]  Zhao Zhang,et al.  Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[9]  Koen De Bosschere,et al.  2FAR: A 2bcgskew Predictor Fused by an Alloyed Redundant History Skewed Perceptron Branch Predictor , 2005, J. Instr. Level Parallelism.

[10]  Shih-Lien Lu,et al.  An FPGA-based Pentium® in a complete desktop system , 2007, FPGA '07.

[11]  Eriko Nurvitadhi,et al.  Design, implementation, and verification of active cache emulator (ACE) , 2006, FPGA '06.

[12]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[13]  M. Desnoyers,et al.  The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux , 2006 .

[14]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[15]  Simha Sethumadhavan,et al.  Rapid identification of architectural bottlenecks via precise event counting , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[16]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[17]  Shin-Dug Kim,et al.  Reconfigurable Address Collector and Flying Cache Simulator , 1997, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97.

[18]  Cedell Alexander,et al.  Cache memory performance in a unix enviroment , 1986, CARN.

[19]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[20]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[21]  Trevor Mudge,et al.  Monster : a tool for analyzing the interaction between operating systems and computer architectures , 1992 .

[22]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[23]  Aamer Jaleel,et al.  DRAMsim: a memory system simulator , 2005, CARN.

[24]  Yongbing Huang,et al.  A lightweight hybrid hardware/software approach for object-relative memory profiling , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[25]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[26]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[27]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[28]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[29]  Michel Dubois,et al.  RPM: A Rapid Prototyping Engine for Multiprocessor Systems , 1995, Computer.

[30]  Josep Torrellas,et al.  Characterizing the caching and synchronization performance of a multiprocessor operating system , 1992, ASPLOS V.

[31]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[32]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).