Loaf: a framework and infrastructure for creating online adaptive solutions

Achieving effective online adaptation for natively executed applications has proved quite challenging and to date has not been widely adopted. Traditionally, to enable online adaptation for native binary applications, a run-time layer is added that virtualizes the execution of the application by performing dynamic binary to binary translation. This virtual layer injects trampolines and instrumentation into the translated code to maintain control of the application. This approach adds significant overhead and complexity to the application, discouraging its use for online adaptation in commercial deployments and particularly in the modern datacenter computing domain. In this work we present a new lightweight paradigm for online adaptation that leverages current microarchitectural advances to efficiently enable online monitoring and adaptation without the complexity of binary translation or fine-grain instrumentation. Our methodology takes advantage of the ubiquitous hardware performance monitors present in modern chip micro-architectures to dynamically monitor micro-architectural events and application behavior with negligible overhead. By leveraging these capabilities to develop an innovative lightweight online adaptation framework (Loaf) we are able to address a number of important real-world online adaptation problems.

[1]  Wei Hu,et al.  Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems , 2007, CGO.

[2]  Albert Cohen,et al.  Quick and Practical Run-Time Evaluation of Multiple Program Optimizations , 2007, Trans. High Perform. Embed. Archit. Compil..

[3]  Jack J. Dongarra,et al.  End-user Tools for Application Performance Analysis Using Hardware Counters , 2001, ISCA PDCS.

[4]  Michael Stumm,et al.  Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.

[5]  Mary Lou Soffa,et al.  An approach toward profit-driven optimization , 2006, TACO.

[6]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[7]  Weifeng Zhang,et al.  A self-repairing prefetcher in an event-driven dynamic optimization framework , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[8]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[10]  Jason Mars,et al.  Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations , 2009, 2009 International Symposium on Code Generation and Optimization.

[11]  Rajiv Gupta,et al.  Resource-sensitive profile-directed data flow analysis for code optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Mauricio J. Serrano,et al.  Prefetch injection based on hardware monitoring and object metadata , 2004, PLDI '04.

[13]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[14]  Olivier Temam,et al.  Collective Optimization , 2008, HiPEAC.

[15]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[16]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  Wei-Chung Hsu,et al.  Dynamic trace selection using performance monitoring hardware sampling , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[18]  Wei-Chung Hsu,et al.  The performance of runtime data cache prefetching in a dynamic optimization system , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[19]  Michael Wolfe,et al.  Multiple Version Loops , 1987, ICPP.

[20]  Francisco J. Cazorla,et al.  FlexDCP: a QoS framework for CMP architectures , 2009, OPSR.

[21]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[22]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[23]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[24]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[25]  Peter Petrov,et al.  Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems , 2007, CASES '07.

[26]  Weifeng Zhang,et al.  An event-driven multithreaded dynamic optimization framework , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[27]  Michael Hind,et al.  Flow-sensitive interprocedural constant propagation , 1995, PLDI '95.

[28]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[29]  Ramesh Illikkal,et al.  Rate-based QoS techniques for cache/memory in CMP platforms , 2009, ICS.

[30]  Bilha Mendelson,et al.  Detecting Change in Program Behavior for Adaptive Optimization , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[31]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Mary Beth Rosson,et al.  Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications , 2000, Conference on Object-Oriented Programming Systems, Languages, and Applications.

[33]  Mary Lou Soffa,et al.  Retargetable and reconfigurable software dynamic translation , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[34]  Wei-Chung Hsu,et al.  The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System , 2003, MICRO.

[35]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[36]  Tulika Mitra,et al.  Exploring locking & partitioning for predictable shared caches on multi-cores , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[37]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[38]  Toshiaki Yasue,et al.  Design and evaluation of dynamic optimizations for a Java just-in-time compiler , 2005, TOPL.

[39]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[40]  Chandra Krintz,et al.  Dynamic prediction of collection yield for managed runtimes , 2009, ASPLOS.

[41]  Wei Li,et al.  Interprocedural array remapping , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[42]  Wei-Chung Hsu,et al.  Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[43]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[44]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[45]  Chandra Krintz,et al.  The mapping collector: virtual memory support for generational, parallel, and concurrent compaction , 2008, ASPLOS.

[46]  Matti A. Hiltunen,et al.  Profile-directed optimization of event-based programs , 2002, PLDI '02.

[47]  Ken Kennedy,et al.  A Methodology for Procedure Cloning , 1993, Computer languages.

[48]  Guy E. Blelloch,et al.  Effectively sharing a cache among threads , 2004, SPAA '04.

[49]  Mary Lou Soffa,et al.  Contention aware execution: online contention detection and response , 2010, CGO '10.

[50]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[51]  Thomas R. Gross,et al.  Online optimizations driven by hardware performance monitoring , 2007, PLDI '07.

[52]  Guy E. Blelloch,et al.  Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.

[53]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[54]  Martin C. Rinard,et al.  Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.

[55]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.