论文信息 - Interprocedural optimizations for improving data cache performance of array-intensive embedded applications

Interprocedural optimizations for improving data cache performance of array-intensive embedded applications

As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.

Wei Zhang | Mahmut T. Kandemir | Mustafa Karaköy | Guangyu Chen

[1] Francky Catthoor,et al. Custom Memory Management Methodology , 1998, Springer US.

[2] Francky Catthoor,et al. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[3] Saman Amarasinghe,et al. The suif compiler for scalable parallel machines , 1995 .

[4] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.

[5] Mahmut T. Kandemir,et al. A framework for interprocedural locality optimization using both loop and data layout transformations , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[6] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .

[7] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .

[8] Ken Kennedy,et al. A Methodology for Procedure Cloning , 1993, Computer languages.

[9] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[10] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[11] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.