It is clear that automatic compiler support for energy optimization can lead to better embedded system implementations with reduced design time and cost. Efficient solutions to energy optimization problems are particularly important for array-dominated applications that spend a significant portion of their energy budget in executing memory-related operations. Recent interest in multi-bank memory architectures and low-power operating modes motivates us to investigate whether current locality-oriented loop-level transformations are suitable from an energy perspective in a multi-bank architecture, and if not, how these transformations can be tuned to take into account the banked nature of the memory structure and the existence of low-power modes. In this paper, we discuss the similarities and conflicts between two complementary objectives, namely, optimizing cache locality and reducing memory system energy, and try to see whether loop transformations developed for the former objective can also be used for the latter. To test our approach, we have implemented bank-conscious versions of three loop transformation techniques (loop fission/fusion, linear loop transformations and loop tiling) using an experimental compiler infrastructure and measured the energy benefits using nine array-dominated codes. Our results show that the modified (memory bank-aware) loop transformations result in large energy savings in both cacheless and cache-based systems, and that the execution times of the resulting codes are competitive with those obtained using pure locality-oriented techniques in a cache-based system.
[1]
Francky Catthoor,et al.
Custom Memory Management Methodology
,
1998,
Springer US.
[2]
Michael Wolfe,et al.
High performance compilers for parallel computing
,
1995
.
[3]
Michael F. P. O'Boyle,et al.
Integrating Loop and Data Transformations for Global Optimization
,
2002,
J. Parallel Distributed Comput..
[4]
Monica S. Lam,et al.
Maximizing Multiprocessor Performance with the SUIF Compiler
,
1996,
Digit. Tech. J..
[5]
Michael F. P. O'Boyle,et al.
Integrating loop and data transformations for global optimisation
,
1998,
Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[6]
Mahmut T. Kandemir,et al.
Energy-oriented compiler optimizations for partitioned memory architectures
,
2000,
CASES '00.
[7]
Alvin R. Lebeck,et al.
Power aware page allocation
,
2000,
SIGP.
[8]
Wei Li,et al.
Compiling for NUMA Parallel Machines
,
1993
.
[9]
Chau-Wen Tseng,et al.
Improving data locality with loop transformations
,
1996,
TOPL.
[10]
Mahmut T. Kandemir,et al.
DRAM energy management using software and hardware directed power mode control
,
2001,
Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[11]
Francky Catthoor,et al.
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
,
1998
.