The Core Degree Based Tag Reduction on Chip Multiprocessor to Balance Energy Saving and Performance Overhead

Tag reduction is an approach to save energy of the cache system in a processor. Our previous work described that it can save more energy on a Chip Multiprocessor (CMP) than on a single-core processor. In this paper, we further investigate the problem on balancing energy saving and performance overhead when tag reduction is used for the low power Chip Multiprocessor (CMP). We first introduce the core degree concept which is defined as the number of cores that tag reduction can use for each thread. We then propose a core degree based tag approach that is to optimize the core degree such that the best balance of energy and performance can be achieved. In particular, as the basis for such optimization, the theoretical upper bounds of the energy savings and performance overhead are decided as function of the core degree. In our experiments, we use a 16-core CMP for example. In order to obtain the energy consumption and performance overhead with various core degrees, we construct an experimental environment, which is based on the Linux operating system. With the experimental environment, benchmarks of SPEC CPU2006 are used to evaluate our core degree based tag reduction. Finally, the experimental results show that the most desired balance of energy saving and performance overhead is achieved when core degree is set to 6.

[1]  Peter Petrov,et al.  Tag compression for low power in dynamically customizable embedded processors , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Peter Petrov,et al.  Virtual page tag reduction for low-power TLBs , 2003, Proceedings 21st International Conference on Computer Design.

[3]  Gianluca Palermo,et al.  Efficient Synchronization for Embedded On-Chip Multiprocessors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Long Zheng,et al.  I-Cache Tag Reduction for Low Power Chip Multiprocessor , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[5]  R. Stephany,et al.  A 200MHz 32b 0.5W CMOS RISC Microprocessor , 1998 .

[6]  Peter Petrov,et al.  Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory , 2006, International Journal of Parallel Programming.

[7]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  Xiangrong Zhou,et al.  Heterogeneously tagged caches for low-power embedded systems with virtual memory support , 2008, TODE.

[9]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[10]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[11]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..