A technique for variable dependence driven loop peeling

Loops in programs are the source of many optimizations leading to performance improvements, particularly on modern high-performance architectures as well as vector and multithreaded systems. Among the optimization techniques, loop peeling is an important technique that can be used to parallelize computations. The technique relies on moving computations in early iterations out of the loop body such that the remaining iterations can be executed in parallel. A key issue in applying loop peeling is the number of iterations that must be peeled off from the loop body. Current techniques use heuristics or ad hoc techniques to peel a fixed number of iterations or a speculated number of iterations. To our knowledge, no formal or systematic technique that can be used by compilers to determine the number of iterations that must be peeled off based on the program characteristics. In this paper we introduce one technique that uses variable dependence analysis for identifying the number of iterations to be peeled off. Our goal is to find general techniques that can accurately determine the ideal number of iterations for loop peeling, while working within the context of other loop optimizations including code motion.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[3]  Mikhail A. Bulyonkov,et al.  Practical Aspects of Specialization of Algol-like Programs , 1996, Dagstuhl Seminar on Partial Evaluation.

[4]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[5]  Yoshihiko Futamura,et al.  A Loop Optimization Technique Based on Quasi- Invariance , 2000 .

[6]  Utpal Banerjee,et al.  An introduction to a formal theory of dependence analysis , 1988, The Journal of Supercomputing.

[7]  Scott Mahlke,et al.  Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[8]  David I. August Hyperblock performance optimizations for ILP processors , 1993 .

[9]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[10]  Rajiv Gupta,et al.  Complete removal of redundant expressions , 1998, PLDI 1998.

[11]  Bernhard Steffen,et al.  The Value Flow Graph: A Program Representation for Optimal Program Transformations , 1990, ESOP.

[12]  Michael Wolfe,et al.  Optimizing supercompilers for supercomputers , 1989, ICS.

[13]  David C. Lin Compiler Support For Predicated Execution In Superscalar Processors , 1992 .

[14]  Bernhard Steffen,et al.  Property-Oriented Expansion , 1996, SAS.

[15]  Ron Cytron,et al.  Code motion of control structures in high-level languages , 1986, POPL '86.

[16]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.

[17]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[18]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[19]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[20]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[21]  Olivier Danvy,et al.  Partial evaluation , 2003 .

[22]  Wanlei Zhou Proceedings fifth International Conference on algorithms and architectures for parallel processing , 2002 .

[23]  Robert Metzger,et al.  Interprocedural constant propagation: an empirical study , 1993, LOPL.