Static Prediction of Loop Iteration Counts Using Machine Learning to Enable Hot Spot Optimizations

In general, program execution spends most of the time in a small fraction of code called hot spots of the program. These regions where optimization would be most beneficial are mainly composed of loops and must be identified to enable hot spot optimizations. Consequently, identifying hot spots involves determining loop iteration counts arising at run-time of the program, which is often not knowable in advance at run-time and even less statically knowable at compile time of the application by using only static analyses. In this paper we present a sophisticated approach using machine learning techniques to automatically generate heuristics that provide the compiler with knowledge of this run-time behavior, hence yielding more precise heuristics than those generated by pure static analyses. Our experimental results demonstrate the accuracy of our approach and show the general applicability to a wide range of programs with different behavior as we have used 175 programs of 12 benchmark suites in total from different real-world application domains for our experiments. Among others, our approach eliminates the need for manual annotations of run-time information, which automates and facilitates the development of complex software, thus improving the software engineering process.

[1]  Grigori Fursin Collective Tuning Initiative: automating and accelerating development and optimization of computing systems , 2009 .

[2]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[3]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[4]  Kevin London,et al.  Low Level Architectural Characterization Benchmarks for ParallelComputers , 1998 .

[5]  Frank Vahid,et al.  A Study on the Loop Behavior of Embedded Programs , 2002 .

[6]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Jason E. Fritts,et al.  MediaBench II video: expediting the next generation of video systems research , 2005, IS&T/SPIE Electronic Imaging.

[9]  Thomas R. Gross,et al.  Approximation of Worst-Case Execution Time for Preemptive Multitasking Systems , 2000, LCTES.

[10]  Sabine Glesner,et al.  Static prediction of recursion frequency using machine learning to enable hot spot optimizations , 2012, 2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia.

[11]  Grigori Fursin,et al.  A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning Techniques , 2007, NPC.

[12]  Sabine Glesner,et al.  Intelligent Task Mapping Using Machine Learning , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[13]  Zhiyuan Li,et al.  A Compiler Analysis of Interprocedural Data Communication , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[15]  J.G.F. Coutinho,et al.  Optimising multi-loop programs for heterogeneous computing systems , 2009, 2009 5th Southern Conference on Programmable Logic (SPL).

[16]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[17]  Westley Weimer,et al.  The road not taken: Estimating path execution frequency statically , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Giovanni Agosta,et al.  Dynamic Look Ahead Compilation: A Technique to Hide JIT Compilation Latencies in Multicore Environment , 2009, CC.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Michael D. Smith,et al.  Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.

[21]  Vivek Sarkar,et al.  Reducing the overhead of dynamic compilation , 2001, Softw. Pract. Exp..

[22]  Giovanni Agosta,et al.  Selective compilation via fast code analysis and bytecode tracing , 2006, SAC '06.

[23]  Frank Vahid,et al.  Profiling tools for hardware/software partitioning of embedded applications , 2003 .

[24]  James R. Larus,et al.  Static branch frequency and program profile analysis , 1994, MICRO 27.

[25]  Monika Schmidt,et al.  A complexity measure based on selection and nesting , 1985, PERV.

[26]  R. V. D. Wijngaart NAS Parallel Benchmarks Version 2.4 , 2022 .

[27]  Jan Gustafsson The WCET Tool Challenge 2006 , 2006 .

[28]  Youfeng Wu,et al.  Continuous trip count profiling for loop optimization in two-phase dynamic binary translators , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..

[29]  Mitsuhisa Sato,et al.  Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[30]  Siyuan Ma,et al.  A Profile-based Memory Access Optimizing Technology on CBE Architecture , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[31]  Seong-Won Lee,et al.  Enhanced hot spot detection heuristics for embedded java just-in-time compilers , 2008, LCTES '08.