An optimizing compiler cannot generate one best code pattern for all input data. There is no ‘one optimization fits all’ inputs. To attain high performance for a large range of inputs, it is therefore desirable to resort to some kind of specialization. Data specialization significantly improves the performance delivered by the compiler-generated codes. Specialization is, however, limited by code expansion and introduces a time overhead for the selection of the appropriate version. We propose a new method to specialize the code at the assembly level for loop structures. Our specialization scheme focuses on different ranges of loop trip count and combines all these versions into a code that switches smoothly from one to the other while the iteration count increases. Hence, the resulting code achieves the same level of performance than each version on its specific iteration interval. We illustrate the benefit of our method on the SPEC benchmarks with detailed experimental results. Copyright © 2008 John Wiley & Sons, Ltd.
[1]
Uwe Schwiegelshohn,et al.
On Optimal Parallelization of Arbitrary Loops
,
1991,
J. Parallel Distributed Comput..
[2]
Martin Griebl,et al.
Index Set Splitting
,
2000,
International Journal of Parallel Programming.
[3]
Vicki H. Allan,et al.
Software pipelining
,
1995,
CSUR.
[4]
B. Ramakrishna Rau,et al.
Iterative modulo scheduling: an algorithm for software pipelining loops
,
1994,
MICRO 27.
[5]
Yves Robert,et al.
Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric
,
1995,
J. Parallel Distributed Comput..
[6]
Martin C. Rinard,et al.
Dynamic feedback: an effective technique for adaptive computing
,
1997,
PLDI '97.