Predicting the impact of optimizations for embedded systems

When applying optimizations, a number of decisions are made using fixed strategies, such as always applying an optimization if it is applicable, applying optimizations in a fixed order and assuming a fixed configuration for optimizations such as tile size and loop unrolling factor. While it is widely recognized that these fixed strategies may not be the most appropriate for producing high quality code, especially for embedded systems, there are no general and automatic strategies that do otherwise. In this paper, we present a framework that enables these decisions to be made based on predicting the impact of an optimization, taking into account resources and code context. The framework consists of optimization models, code models and resource models, which are integrated for predicting the impact of applying optimizations. Because data cache performance is important to embedded codes, we focus on cache performance and present an instance of the framework for cache performance in this paper. Since most opportunities for cache improvement come from loop optimizations, we describe code, optimization and cache models tailored to predict the impact of applying loop optimizations for data locality. Experimentally we demonstrate the need to selectively apply optimizations and show the performance benefit of our framework in predicting when to apply an optimization. We also show that our framework can be used to choose the most beneficial optimization when a number of optimizations can be applied to a loop nest. And lastly, we show that we can use the framework to combine optimizations on a loop nest.

[1]  Narayanan Vijaykrishnan,et al.  A Unified Energy Estimation Framework with Integrated Hardware-Software Optimizations , 2000, ISCA 2000.

[2]  Mary Lou Soffa,et al.  An approach for exploring code improving transformations , 1997, TOPL.

[3]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[5]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[6]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[7]  Vivek Sarkar,et al.  Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..

[8]  Douglas L. Jones,et al.  VISTA: a system for interactive code improvement , 2002, LCTES/SCOPES '02.

[9]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[11]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[12]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[13]  Sally A. McKee,et al.  Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).

[14]  Mahmut T. Kandemir,et al.  Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.

[15]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[16]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[17]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[18]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[19]  Wei Zhang,et al.  Compiler-directed cache polymorphism , 2002, LCTES/SCOPES '02.

[20]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[21]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[22]  Keith D. Cooper,et al.  Combining analyses, combining optimizations , 1995, TOPL.

[23]  Sally A. McKee,et al.  A cost framework for evaluating integrated restructuring optimizations , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.