High-level Control Flow Transformations for Performance Improvement of Address-Dominated Multimedia Applications

This paper describes a set of novel highlevel control flow transformations for performance improvement of typical address-dominated multimedia applications. We show that these transformations applied at the source code level can have a very large impact on execution time at the cost of limited overhead in code size for a broad range of instruction set processor families (i. e. CISC, RISC, DSP, VLIW, . . . ). For a profound evaluation, all transformations are applied to the C-codes of two real-life applications selected from the video and image processing domains. A detailed analysis of the effect of the transformations is done by compiling and executing the transformed programs on seven different programmable processors. The measured runtimes indicate quite significant improvements in all processor families when comparing the performance of the transformed codes to their initial version even when these are compiled using their native optimizing compilers with their most aggressive optimization features enabled. The average gains in execution time range from 40.2% and 87.7% depending on the driver, with an average overhead in code size between 21.1% and 100.9%.

[1]  M. Bister,et al.  Automated segmentation of cardiac MR images , 1989, [1989] Proceedings. Computers in Cardiology.

[2]  Steven W. K. Tjiang,et al.  An overview of the suif compiler system , 1990 .

[3]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[4]  William Pugh,et al.  A unifying framework for iteration reordering transformations , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[5]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[6]  R. Leupers,et al.  Algorithms for address assignment in DSP code generation , 1996, Proceedings of International Conference on Computer Aided Design.

[7]  Srinivas Devadas,et al.  Analysis and Evaluation of Address Arithmetic Capabilities in Custom DSP Architectures , 1997, Des. Autom. Embed. Syst..

[8]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[9]  C. Gebotys DSP address optimization using a minimum cost circulation technique , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[10]  Ahmed Amine Jerraya,et al.  Compilation Methods for the Address Calculation Units of Embedded Processor Systems , 1997, Des. Autom. Embed. Syst..

[11]  Hugo De Man,et al.  High-level address optimization and synthesis techniques for data-transfer-intensive applications , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Fabien Coelho,et al.  Using algebraic transformations to optimize expression evaluation in scientific code , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[13]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[14]  Henk Corporaal,et al.  Automatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition , 2000, Euro-Par.

[15]  Rainer Leupers,et al.  Code optimization techniques for embedded processors - methods, algorithms, and tools , 2000 .

[16]  Francky Catthoor,et al.  Analysis of high-level address code transformations for programmable processors , 2000, DATE '00.

[17]  Chaitali Chakrabarti,et al.  Address code generation for digital signal processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[18]  Heiko Falk,et al.  Control Flow Optimization by Loop Nest Splitting at the Source Code Level , 2002 .

[19]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[20]  Heiko Falk,et al.  Control Flow Driven Splitting of Loop Nests at the Source Code Level , 2003, DATE.