Path Grammar Guided Trace Compression and Trace Approximation

Trace-driven simulation is an important technique used in the evaluation of computer architecture innovations. However using it for studying parallel computers and applications is at best very challenging. Acquiring, representing and storing the traces are among the major issues. In this paper, we introduce path grammar guided trace compression (PGGTC) and effective address trace approximation (TA) to speedup compression and reduce trace sizes. PGGTC relies on static analysis to build rules and determine actions to guide online trace compression. Combined with gzip, PGGTC can compresses control flow traces over 330 times smaller than using gzip alone. Compared to the widely popular Sequitur algorithm alone, PGGTC with gzip is on average 40 times faster, while the traces are only 3 times bigger. PGGTC can be also used with Sequitur to double the compression ratios of Sequitur by itself and do it 14 times faster than Sequitur by itself. Address traces of parallel applications with significant randomness are often impossibly large even after being compressed with any lossless scheme including PGGTC. For effective address trace reduction, we introduce trace approximation (TA). Performance-wise similar effective addresses are generated based on very compact summaries of how the memory is accessed during each structure instance instead of compressing them. We demonstrate two approaches: selective dumping and memory signatures, to summarize the properties of effective address sequences. Both approaches are validated by feeding the generated approximate trace to cache simulators of 25 different configurations. The simulated results are very close to the simulation results based on full effective traces while the selective dumped address or memory signatures require several order of magnitude less disk space to store. In summary, we move trace-driven simulation into the realm of the feasible for larger parallel machines and applications

[1]  Margaret Martonosi,et al.  Challenges in Computer Architecture Evaluation , 2003, Computer.

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Wen-Hann Wang,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.

[5]  Lizy Kurian John,et al.  Locality-based online trace compression , 2004, IEEE Transactions on Computers.

[6]  Martin Burtscher,et al.  Compressing extended program traces using value predictors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7]  M. Milenkovic,et al.  Exploiting streams in instruction and data address trace compression , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[8]  Martin Burtscher,et al.  VPC3: a fast and effective trace-compression algorithm , 2004, SIGMETRICS '04/Performance '04.

[9]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[10]  Eric E. Johnson,et al.  Accuracy of filtered traces , 1995, Proceedings International Phoenix Conference on Computers and Communications.

[11]  Jaspal Subhlok,et al.  Replicating memory behavior for performance prediction , 2004 .

[12]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[13]  Gregory R. Ganger,et al.  Designing computer systems with MEMS-based storage , 2000, ASPLOS.

[14]  Rajiv Gupta,et al.  Timestamped whole program path representation and its applications , 2001, PLDI '01.

[15]  Xiaofeng Gao,et al.  TFP: Time-Sensitive, Flow-Specific Profiling at Runtime , 2003, LCPC.

[16]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[17]  Xiaofeng Gao,et al.  Exploiting Stability to Reduce Time-Space Cost for Memory Tracing , 2003, International Conference on Computational Science.

[18]  J. Larus Whole program paths , 1999, PLDI '99.

[19]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[20]  Xiaofeng Gao,et al.  Reducing overheads for acquiring dynamic memory traces , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[21]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[22]  Xiaofeng Gao,et al.  ALITER: an asynchronous lightweight instrumentation tool for event recording , 2005, CARN.

[23]  Eric E. Johnson,et al.  PDATS Lossless Address Trace Compression For Reducing File Size And Access Time , 1994, Proceeding of 13th IEEE Annual International Phoenix Conference on Computers and Communications.

[24]  James K. Archibald,et al.  The inaccuracy of trace-driven simulation using incomplete multiprogramming trace data , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[25]  Michael Laurenzano,et al.  How well can simple metrics represent the performance of HPC applications? , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[26]  Anant Agarwal,et al.  Blocking: exploiting spatial locality for trace compaction , 1990, SIGMETRICS '90.

[27]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.