Online Phase-Adaptive Data Layout Selection

Good data layouts improve cache and TLB performance of object-oriented software, but unfortunately, selecting an optimal data layout a priori is NP-hard. This paper introduces layout auditing, a technique that selects the best among a set of layouts online (while the program is running). Layout auditing randomly applies different layouts over time and observes their performance. As it becomes confident about which layout performs best, it selects that layout with higher probability. But if a phase shift causes a different layout to perform better, layout auditing learns the new best layout. We implemented our technique in a product Java virtual machine, using copying generational garbage collection to produce different layouts, and tested it on 20 long-running benchmarks and 4 hardware platforms. Given any combination of benchmark and platform, layout auditing consistently performs close to the best layout for that combination, without requiring offline training.

[1]  Xiaofeng Gao,et al.  Profile-guided proactive garbage collection for locality optimization , 2006, PLDI '06.

[2]  Daeyeon Park,et al.  Improving the effectiveness of software prefetching with adaptive executions , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[3]  David A. Moon,et al.  Garbage collection in a large LISP system , 1984, LFP '84.

[4]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[5]  Rafael Dueire Lins,et al.  Garbage collection: algorithms for automatic dynamic memory management , 1996 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Guy E. Blelloch,et al.  A parallel, real-time garbage collector , 2001, PLDI '01.

[8]  Dave Thomas ECOOP 2006 - Object-Oriented Programming, 20th European Conference, Nantes, France, July 3-7, 2006, Proceedings , 2006, ECOOP.

[9]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[10]  Dror Rawitz,et al.  The hardness of cache conscious data placement , 2002, POPL '02.

[11]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[12]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[13]  Chen Ding,et al.  A hierarchical model of data locality , 2006, POPL '06.

[14]  Rajesh Bordawekar,et al.  Exploiting prolific types for memory management and optimizations , 2002, POPL '02.

[15]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[16]  Martin C. Rinard,et al.  Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.

[17]  Antony L. Hosking,et al.  Reducing generational copy reserve overhead with fallback compaction , 2006, ISMM '06.

[18]  Erez Petrank,et al.  An efficient parallel heap compaction algorithm , 2004, OOPSLA.

[19]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[20]  Martin Hirzel,et al.  Bursty Tracing: A Framework for Low-Overhead Temporal Profiling , 2001 .

[21]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[22]  Erez Petrank,et al.  The Compressor: concurrent, incremental, and parallel compaction , 2006, PLDI '06.

[23]  Mauricio J. Serrano,et al.  Prefetch injection based on hardware monitoring and object metadata , 2004, PLDI '04.

[24]  Martin Hirzel,et al.  Improving locality with parallel hierarchical copying GC , 2006, ISMM '06.

[25]  Gavin Brown,et al.  Intelligent selection of application-specific garbage collectors , 2007, ISMM '07.

[26]  Paul R. Wilson,et al.  Effective “static-graph” reorganization to improve locality in garbage-collected systems , 1991, PLDI '91.

[27]  Perry Cheng,et al.  The garbage collection advantage: improving program locality , 2004, OOPSLA.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Henry Lieberman,et al.  A real-time garbage collector based on the lifetimes of objects , 1983, CACM.

[30]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[31]  Chris J. Cheney A nonrecursive list compacting algorithm , 1970, Commun. ACM.

[32]  Zhen Fang,et al.  The Impulse Memory Controller , 2001, IEEE Trans. Computers.

[33]  Martin Hirzel,et al.  Data layouts for object-oriented programs , 2007, SIGMETRICS '07.

[34]  Robert Fenichel,et al.  A LISP garbage-collector for virtual-memory computer systems , 1969, CACM.

[35]  Andrew W. Appel,et al.  Creating and preserving locality of java applications at allocation and garbage collection times , 2002, OOPSLA '02.

[36]  Evan Tick,et al.  Evaluation of Parallel Copying Garbage Collection on a Shared-Memory Multiprocessor , 1993, IEEE Trans. Parallel Distributed Syst..

[37]  Amer Diwan,et al.  Connectivity-based garbage collection , 2003, OOPSLA 2003.

[38]  David M. Ungar,et al.  Generation Scavenging: A non-disruptive high performance storage reclamation algorithm , 1984, SDE 1.

[39]  Robert Courts,et al.  Improving locality of reference in a garbage-collecting memory management system , 1988, CACM.

[40]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[41]  Chandra Krintz,et al.  Online phase detection algorithms , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[42]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[43]  Toshio Nakatani,et al.  Stride prefetching by dynamically inspecting objects , 2003, PLDI '03.

[44]  William R. Cook,et al.  Automatic Prefetching by Traversal Profiling in Object Persistence Architectures , 2006, ECOOP.

[45]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[46]  Weifeng Zhang,et al.  A self-repairing prefetcher in an event-driven dynamic optimization framework , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[47]  Chandra Krintz,et al.  Dynamic selection of application-specific garbage collectors , 2004, ISMM '04.

[48]  Larry Rudolph,et al.  Ubiquitous Memory Introspection , 2007, CGO.

[49]  Weng-Fai Wong,et al.  General-purpose operating systems, such as Linux, , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[50]  James R. Larus,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998, ISMM '98.

[51]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.