Global Trade-oo between Code Size and Performance for Loop Unrolling on Vliw Architectures

Many media processors 28, 7, 14, 8, 18, 27], used for computing intensive embedded applications, are VLIW architectures that rely on the compiler to exploit Instruction Level Parallelism. Loop unrolling is generally used to expose instruction parallelism but computing the unrolling factor is very diicult as instruction cache misses and spill code can cancel the expected beneet of the transformation. Moreover increasing the code size directly impacts on the embedded system cost. In this paper, we propose a method, called UFC (Unrolling Factor computation under Constraints) to compute unrolling factors of set of loops while taking into account code size, a major issue for embedded systems.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[3]  Brad Calder,et al.  Efficient procedure mapping using cache line coloring , 1997, PLDI '97.

[4]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.

[5]  Sanjay Jinturkar,et al.  Aggressive Loop Unrolling in a Retargetable Optimizing Compiler , 1996, CC.

[6]  Thomas M. Conte,et al.  The Effect of Code Expanding Optimizations on Instruction Cache Design , 1993, IEEE Trans. Computers.

[7]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[8]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.

[9]  Zbigniew Chamski,et al.  GCDS: A Compiler Strategy for Trading Code Size Against Performance in Embedded Applications , 1998 .

[10]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[11]  Michael F. P. O'Boyle,et al.  A Feasibility Study in Iterative Compilation , 1999, ISHPC.

[12]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[13]  Thomas R. Gross,et al.  Avoidance and suppression of compensation code in a trace scheduling compiler , 1994, TOPL.

[14]  Michael D. Smith,et al.  Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.

[15]  James E. Smith,et al.  A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.

[16]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[17]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[18]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.

[19]  Scott A. Mahlke,et al.  Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.