A Framework for Tracking Memory Accesses in Scientific Applications

Profiling is of great assistance in understanding and optimizing applications' behavior. Today's profiling techniques help developers focus on the pieces of code leading to the highest penalties according to a given performance metric. In this paper we describe a pair of tools we have extended to complement the traditional algorithm-oriented analysis. Our extended tools provide new object-differentiated profiling capabilities that help software developers and hardware designers (1) understand access patterns, (2) identify unexpected access patterns, and (3) determine whether a particular memory object is consistently featuring a troublesome access pattern. Memory objects found in this way may have gone unnoticed with the traditional profiling approach. This additional view may lead developers to think of different ways of storing data, leveraging different algorithms, or employing different memory subsystems in future heterogeneous memory systems.

[1]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Margaret Martonosi,et al.  Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.

[3]  Nicholas Nethercote,et al.  Using Valgrind to Detect Undefined Value Errors with Bit-Precision , 2005, USENIX Annual Technical Conference, General Track.

[4]  Brian J. N. Wylie,et al.  Memory Profiling using Hardware Counters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[5]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[6]  Arnaldo Carvalho de Melo,et al.  The New Linux ’ perf ’ Tools , 2010 .

[7]  Krishna M. Kavi,et al.  Gleipnir: a memory profiling and tracing tool , 2013, CARN.

[8]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[9]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[10]  Krishna M. Kavi,et al.  Trace Driven Data Structure Transformations , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[11]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[12]  Krishna M. Kavi,et al.  International Conference on Computational Science, ICCS 2011 Gleipnir: A Memory Analysis Tool , 2011, ICCS.

[13]  Helen Davis,et al.  Tango: A Multiprocessor Simulation and Tracing System , 1990 .

[14]  Michael J. Eager Introduction to the DWARF Debugging Format , 2007 .

[15]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[16]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[17]  Josef Weidendorfer,et al.  A Tool Suite for Simulation Based Analysis of Memory Access Behavior , 2004, International Conference on Computational Science.

[18]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[19]  Steven J. Plimpton,et al.  Particle{Mesh Ewald and rRESPA for Parallel Molecular Dynamics Simulations , 1997 .