An Unfolding-Based Loop Optimization Technique

Loops in programs are the source of many optimizations for improving program performance, particularly on modern high-performance architectures as well as vector and multithreaded systems. Techniques such as loop invariant code motion, loop unrolling and loop peeling have demonstrated their utility in compiler optimizations. However, many of these techniques can only be used in very limited cases when the loops are ”well-structured” and easy to analyze. For instance, loop invariant code motion works only when invariant code is inside loops; loop unrolling and loop peeling work effectively when the array references are either constants or affine functions of index variable. It is our contention that there are many opportunities overlooked by limiting the optimizations to well structured loops. In many cases, even ”badly-structured” loops may be transformed into well structured loops. As a case in point, we show how some loop-dependent code can be transformed into loop-invariant code by transforming the loops. Our technique described in this paper relies on unfolding the loop for several initial iterations such that more opportunities may be exposed for many other existing compiler optimization techniques such as loop invariant code motion, loop peeling, loop unrolling, and so on.

[1]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[2]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[3]  Olivier Danvy,et al.  Partial evaluation , 2003 .

[4]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[5]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[6]  Utpal Banerjee,et al.  An introduction to a formal theory of dependence analysis , 1988, The Journal of Supercomputing.

[7]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[8]  Dharma P. Agrawal,et al.  Compiler Optimizations for Scalable Parallel Systems , 2001, Lecture Notes in Computer Science.

[9]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[10]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[11]  Rajiv Gupta,et al.  Complete removal of redundant expressions , 1998, PLDI 1998.

[12]  Scott Mahlke,et al.  Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[13]  Mikhail A. Bulyonkov,et al.  Practical Aspects of Specialization of Algol-like Programs , 1996, Dagstuhl Seminar on Partial Evaluation.

[14]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[15]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.

[16]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[17]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[18]  David I. August Hyperblock performance optimizations for ILP processors , 1993 .

[19]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[20]  Marvin V. Zelkowitz,et al.  Programming Languages: Design and Implementation , 1975 .

[21]  Robert Metzger,et al.  Interprocedural constant propagation: an empirical study , 1993, LOPL.

[22]  Heinrich Müller,et al.  Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung, Dagstuhl Seminar 1997 , 1999, Effiziente Methoden der geometrischen Modellierung und der wissenschaftlichen Visualisierung.

[23]  Bernhard Steffen,et al.  Property-Oriented Expansion , 1996, SAS.

[24]  Ron Cytron,et al.  Code motion of control structures in high-level languages , 1986, POPL '86.

[25]  Bernhard Steffen,et al.  The Value Flow Graph: A Program Representation for Optimal Program Transformations , 1990, ESOP.

[26]  David C. Lin Compiler Support For Predicated Execution In Superscalar Processors , 1992 .