IR-level versus machine-level if-conversion for predicated architectures

If-conversion is a simple yet powerful optimization that converts control dependences into data dependences. It allows elimination of branches and increases available instruction level parallelism and thus overall performance. If-conversion can either be applied alone or in combination with other techniques that increase the size of scheduling regions. The presence of hardware support for predicated execution allows if-conversion to be broadly applied in a given program. This makes it necessary to guide the optimization using heuristic estimates regarding its potential benefit. Similar to other transformations in an optimizing compiler, if-conversion in-herently suffers from phase ordering issues. Driven by these facts, we developed two algorithms for if-conversion targeting the TI TMS320C64x+ architecture within the LLVM framework. Each implementation targets a different level of code abstraction. While one targets the intermediate representation, the other addresses machine-level code. Both make use of an adapted set of estimation heuristics and prove to be successful in general, but each one exhibits different strengths and weaknesses. High-level if-conversion, applied before other control flow transformations, has more freedom to operate. But in contrast to its machine-level counter-part, which is more restricted, its estimations of runtime are less accurate. Our results from experimental evaluation show a mean speedup close to 14% for both algorithms on a set of programs from the MiBench and DSPstone benchmark suites. We give a comparison of the implemented optimizations and discuss gained insights on the topics of if-conversion, phase ordering issues and profitability analysis.

[1]  G. Gao,et al.  If-Conversion in SSA Form , 2004, Euro-Par.

[2]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[3]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[4]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[5]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[6]  Rainer Leupers,et al.  Exploiting conditional instructions in code generation for embedded VLIW processors , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[7]  Scott A. Mahlke,et al.  The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication , 2004, International Journal of Parallel Programming.

[8]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[9]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[10]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[11]  Sebastian Winkel,et al.  Optimal Global Scheduling for Itanium TM Processor Family , 2002 .

[12]  Scott A. Mahlke,et al.  Reverse If-Conversion , 1993, PLDI '93.

[13]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[14]  Yunhong Zhou,et al.  Denali: A practical algorithm for generating optimal code , 2006, TOPL.

[15]  Andreas Krall,et al.  Leveraging Predicated Execution for Multimedia Processing , 2007, 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[16]  Jesse Zhixi Fang,et al.  Compiler Algorithms on If-Conversion, Speculative Predicates Assignment and Predicated Code Optimizations , 1996, LCPC.

[17]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.