We present a novel loop transformation technique, particularly well suited for optimizing embedded compilers, where an increase in compilation time is acceptable in exchange for significant performance increase. The transformation technique optimizes loops containing nested conditional blocks. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed using a novel interval analysis technique that can evaluate conditional expressions in the general polynomial form. Results from interval analysis combined with loop dependency information is used to partition the iteration space of the nested loop. In such cases, the loop nest is decomposed such as to eliminate the conditional test, thus substantially reducing the execution time. Our technique completely eliminates the conditional from the loops (unlike previous techniques) thus further facilitating the application of other optimizations and improving the overall speedup. Applying the proposed transformation technique on loop kernels taken from Mediabench, SPEC-2000, mpeg4, qsdpcm and gimp, on average we measured a 175% (1.75X) improvement of execution time when running on a SPARC processor, a 336% (4.36X) improvement of execution time when running on an Intel Core Duo processor and a 198.9% (2.98X) improvement of execution time when running on a PowerPC G5 processor.
[1]
Heiko Falk,et al.
Control Flow Driven Splitting of Loop Nests at the Source Code Level
,
2003,
DATE.
[2]
Steven S. Muchnick,et al.
Advanced Compiler Design and Implementation
,
1997
.
[3]
Michael Wolfe,et al.
How compilers and tools differ for embedded systems
,
2005,
CASES '05.
[4]
Nikil D. Dutt,et al.
Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications
,
2006,
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[5]
Michael Wolfe,et al.
High performance compilers for parallel computing
,
1995
.
[6]
Miodrag Potkonjak,et al.
MediaBench: a tool for evaluating and synthesizing multimedia and communications systems
,
1997,
Proceedings of 30th Annual International Symposium on Microarchitecture.
[7]
Ken Kennedy,et al.
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
,
2001
.
[8]
Alexandru Nicolau,et al.
Equivalence checking of arithmetic expressions using fast evaluation
,
2005,
CASES '05.