Automatic tuning of two-level caches to embedded applications

The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for optimizations. We present an automated method for tuning two-level caches to embedded applications for reduced energy consumption. The method is applicable to both a simulation-based exploration environment and a hardware-based system prototyping environment. We introduce the two-level cache tuner, or TCaT - a heuristic for searching the huge solution space of possible configurations. The heuristic interlaces the exploration of the two cache levels and searches the various cache parameters in a specific order based on their impact on energy. We show the integrity of our heuristic across multiple memory configurations and even in the presence of hardware/software partitioning - a common optimization capable of achieving significant speedups and/or reduced energy consumption. We apply our exploration heuristic to a large set of embedded applications. Our experiments demonstrate the efficacy of our heuristic: on average the heuristic examines only 7% of the possible cache configurations, but results in cache sub-system energy savings of 53%, only 1% more than the optimal cache configuration. In addition, the configured cache achieves an average speedup of 30% over the base cache configuration due to tuning of cache line size to the application's needs.

[1]  Arijit Ghosh,et al.  Cache optimization for embedded processor cores: An analytical approach , 2004, ACM Trans. Design Autom. Electr. Syst..

[2]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[6]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[7]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[8]  Frank Vahid,et al.  Cache configuration exploration on prototyping platforms , 2003, 14th IEEE International Workshop on Rapid Systems Prototyping, 2003. Proceedings..

[9]  Maurizio Palesi,et al.  Multi-objective design space exploration using genetic algorithms , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[10]  Simon Segars Low power design techniques for microprocessors , 2000 .

[11]  Norman P. Jouppi,et al.  CACTI 2.0: An Integrated Cache Timing and Power Model , 2002 .

[12]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[13]  Frank Vahid,et al.  Platune: a tuning framework for system-on-a-chip platforms , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Frank Vahid,et al.  Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic , 2002, IEEE Des. Test Comput..