In order to deliver the promise of Moore’s Law to the end user, compilers must make decisions that are intimately tied to a specific target architecture. As engineers add architectural features to increase performance, systems become harder to model, and thus, it becomes harder for a compiler to make effective decisions. Machine-learning techniques may be able to help compiler writers model modern architectures. Because learning techniques can effectively make sense of high dimensional spaces, they can be a valuable tool for clarifying and discerning complex decision boundaries. In our work we focus on loop unrolling, a well-known optimization for exposing instruction level parallelism. Using the Open Research Compiler as a testbed, we demonstrate how one can use supervised learning techniques to model the appropriateness of loop unrolling. We use more than 1,100 loops — drawn from 46 benchmarks — to train a simple learning algorithm to recognize when loop unrolling is advantageous. The resulting classifier can predict with 88% accuracy whether a novel loop (i.e., one that was not in the training set) benefits from loop unrolling. Furthermore, we can predict the optimal or nearly optimal unroll factor 74% of the time. We evaluate the ramifications of these prediction accuracies using the Open Research Compiler (ORC) and the Itanium r © 2 architecture. The learned classifier yields a 6% speedup (over ORC’s unrolling heuristic) for SPEC benchmarks, and a 7% speedup on the remainder of our benchmarks. Because the learning techniques we employ run very quickly, we were able to exhaustively determine the four most salient loop characteristics for determining when unrolling is beneficial.
[1]
John R. Ellis,et al.
Bulldog: A Compiler for VLIW Architectures
,
1986
.
[2]
Steve Carr,et al.
Unroll-and-jam using uniformly generated sets
,
1997,
Proceedings of 30th Annual International Symposium on Microarchitecture.
[3]
Saman P. Amarasinghe,et al.
Meta optimization: improving compiler heuristics with machine learning
,
2003,
PLDI '03.
[4]
Vivek Sarkar.
Optimized unrolling of nested loops
,
2000,
ICS '00.
[5]
Keith D. Cooper,et al.
Optimizing for reduced code space using genetic algorithms
,
1999,
LCTES '99.
[6]
Scott Mahlke,et al.
Effective compiler support for predicated execution using the hyperblock
,
1992,
MICRO 1992.
[7]
François Bodin,et al.
A Machine Learning Approach to Automatic Production of Compiler Heuristics
,
2002,
AIMSA.
[8]
Saman P. Amarasinghe,et al.
Exploiting superword level parallelism with multimedia instruction sets
,
2000,
PLDI '00.
[9]
Jack W. Davidson,et al.
Memory access coalescing: a technique for eliminating redundant memory accesses
,
1994,
PLDI '94.
[10]
Ken Kennedy,et al.
Improving the ratio of memory operations to floating-point operations in loops
,
1994,
TOPL.
[11]
Dirk Grunwald,et al.
Evidence-based static branch prediction using machine learning
,
1997,
TOPL.