A tool for characterizing and succinctly representing the data access patterns of applications

Application address streams contain a wealth of information that can be used to characterize the behavior of applications. However, the collection and handling of address streams is complicated by their size and the cost of collecting them. We present PSnAP, a compression scheme specifically designed for capturing the fine-grained patterns that occur in well structured, memory intensive, high performance computing applications. PSnAP profiles are human readable and reveal a great deal of information about the application memory behavior. In addition to providing insight to application behavior the profiles can be used to replay a proxy synthetic address stream for analysis. We demonstrate that the synthetic address streams mimic very closely the behavior of the originals.

[1]  Robert E. Tarjan,et al.  Edge-disjoint spanning trees and depth-first search , 1976, Acta Informatica.

[2]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[3]  Louise Trevillyan,et al.  Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[4]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[5]  Lieven Eeckhout,et al.  Benchmark synthesis for architecture and compiler exploration , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[6]  David R. Kaeli,et al.  Issues in Trace-Driven Simulation , 1993, Performance/SIGMETRICS Tutorials.

[7]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Margaret Martonosi,et al.  Challenges in Computer Architecture Evaluation , 2003, Computer.

[9]  Wing Shing Wong,et al.  Benchmark Synthesis Using the LRU Cache Hit Function , 1988, IEEE Trans. Computers.

[10]  Laura Carrington,et al.  PSnAP: Accurate Synthetic Address Streams through Memory Profiles , 2009, LCPC.

[11]  Xiaofeng Gao,et al.  Reducing time and space costs of memory tracing , 2006 .

[12]  James K. Archibald,et al.  On the Accuracy of Memory Reference Models , 1994, Computer Performance Evaluation.

[13]  Lieven Eeckhout,et al.  Distilling the essence of proprietary workloads into miniature benchmarks , 2008, TACO.

[14]  Xiaofeng Gao,et al.  Reducing overheads for acquiring dynamic memory traces , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[15]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[16]  Peter M. Kogge,et al.  On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.

[17]  Frank Mueller,et al.  Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs? , 2011, PERV.

[18]  Peter Calingaert,et al.  System performance evaluation , 1967, Commun. ACM.

[19]  Jennifer M. Murphy,et al.  The Measurement of Locality and the Behaviour of Programs , 1984, Comput. J..

[20]  Ayumi Shinohara,et al.  Compressed pattern matching for SEQUITUR , 2001, Proceedings DCC 2001. Data Compression Conference.

[21]  Peter J. Denning,et al.  Properties of the working-set model , 1972, CACM.

[22]  Larry Carter,et al.  Path Grammar Guided Trace Compression and Trace Approximation , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[23]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[24]  James K. Archibald,et al.  The inaccuracy of trace-driven simulation using incomplete multiprogramming trace data , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[25]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.

[26]  Martin Burtscher,et al.  VPC3: a fast and effective trace-compression algorithm , 2004, SIGMETRICS '04/Performance '04.

[27]  Joel L. Wolf,et al.  Synthetic Traces for Trace-Driven Simulation of Cache Memories , 1992, IEEE Trans. Computers.

[28]  E. S. Sorenson,et al.  Evaluating synthetic trace models using locality surfaces , 2002, 2002 IEEE International Workshop on Workload Characterization.

[29]  Wilhelm Anacker,et al.  Performance Evaluation of Computing Systems with Memory Hierarchies , 1967, IEEE Trans. Electron. Comput..

[30]  Allan Snavely,et al.  Chameleon : A Framework for Observing , Understanding , and Imitating the Memory Behavior of Applications , 2008 .

[31]  Tajana Rosing,et al.  Fine-Grained Energy Consumption Characterization and Modeling , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.