论文信息 - IR-level versus machine-level if-conversion for predicated architectures

IR-level versus machine-level if-conversion for predicated architectures

If-conversion is a simple yet powerful optimization that converts control dependences into data dependences. It allows elimination of branches and increases available instruction level parallelism and thus overall performance. If-conversion can either be applied alone or in combination with other techniques that increase the size of scheduling regions. The presence of hardware support for predicated execution allows if-conversion to be broadly applied in a given program. This makes it necessary to guide the optimization using heuristic estimates regarding its potential benefit. Similar to other transformations in an optimizing compiler, if-conversion in-herently suffers from phase ordering issues. Driven by these facts, we developed two algorithms for if-conversion targeting the TI TMS320C64x+ architecture within the LLVM framework. Each implementation targets a different level of code abstraction. While one targets the intermediate representation, the other addresses machine-level code. Both make use of an adapted set of estimation heuristics and prove to be successful in general, but each one exhibits different strengths and weaknesses. High-level if-conversion, applied before other control flow transformations, has more freedom to operate. But in contrast to its machine-level counter-part, which is more restricted, its estimations of runtime are less accurate. Our results from experimental evaluation show a mean speedup close to 14% for both algorithms on a set of programs from the MiBench and DSPstone benchmark suites. We give a comparison of the implemented optimizations and discuss gained insights on the topics of if-conversion, phase ordering issues and profitability analysis.

Andreas Krall | Alexander Jordan | Nikolai Kim

[1] G. Gao,et al. If-Conversion in SSA Form , 2004, Euro-Par.

[2] M. Schlansker,et al. On Predicated Execution , 1991 .

[3] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[4] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[5] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[6] Rainer Leupers,et al. Exploiting conditional instructions in code generation for embedded VLIW processors , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[7] Scott A. Mahlke,et al. The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication , 2004, International Journal of Parallel Programming.

[8] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[9] Michael Stepp,et al. Equality saturation: a new approach to optimization , 2009, POPL '09.

[10] L. Almagor,et al. Finding effective compilation sequences , 2004, LCTES '04.

[11] Sebastian Winkel,et al. Optimal Global Scheduling for Itanium TM Processor Family , 2002 .

[12] Scott A. Mahlke,et al. Reverse If-Conversion , 1993, PLDI '93.

[13] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[14] Yunhong Zhou,et al. Denali: A practical algorithm for generating optimal code , 2006, TOPL.

[15] Andreas Krall,et al. Leveraging Predicated Execution for Multimedia Processing , 2007, 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[16] Jesse Zhixi Fang,et al. Compiler Algorithms on If-Conversion, Speculative Predicates Assignment and Predicated Code Optimizations , 1996, LCPC.

[17] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.