Energy reduction techniques for caches and multiprocessors

E nergy consumption is a growing concern in many areas of computer architecture. Not only for the handheld embedded market, but also for desktop machines and high-end server facilities, there is a demand for ever increasing processing power while maintaining or even decreasing energy consumption. For processors embedded in battery-powered devices, consumers both demand an increasing number of features and an increase of battery lifetime. For commodity desktop and high-end server systems, the demand to reduce energy consumption is mostly fueled by cost, environmental issues, and the wish to have systems without noisy cooling systems. This dissertation studies several techniques that aim at reducing energy consumption in processors. Part of the techniques presented in this dissertation focusses at reducing energy consumption by decreasing the amount of data transferred between a processor and external memory. Since memory is one of the known bottlenecks in computer systems, manufacturers had to employ increasingly aggressive techniques in the past decades to increase performance. The techniques proposed in this dissertation target at improving or at least maintaining performance, while reducing the amount of energy dissipated in the memory subsystem. Another part of this dissertation focusses on reducing energy by lowering the speed of nodes in multiprocessor systems in combination with turning off some of these nodes. Multiprocessor systems have gained significant interest in the past years, mostly because power constraints have prevented further increasing clock frequencies and because instruction level parallelism has suffered from diminishing returns. Due to the way how energy is dissipated in semiconductor fabric, using multiple cores on a reduced frequency is an effective way to reduce energy consumption. Due to decreasing sizes of the components from which processors are built, it is expected that this energy model will change significantly in future years. Some of the techniques presented in this dissertation aim at reducing energy consumption in such contemporary and near-future multiprocessor systems.

[1]  Gary S. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[2]  Wei Zhang,et al.  Replication cache: a small fully associative cache to improve data cache reliability , 2005, IEEE Transactions on Computers.

[3]  Daniel Mossé,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[4]  Krzysztof Kuchcinski,et al.  LEneS: task scheduling for low-energy systems using variable supply voltage processors , 2001, ASP-DAC '01.

[5]  Ben Juurlink,et al.  HandBench : A Benchmarking Suite for Processors Embedded in Handheld Devices , 2004 .

[6]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[7]  Gang Qu,et al.  CASPER: an integrated energy-driven approach for task graph scheduling on distributed embedded systems , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[8]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[9]  Rami G. Melhem,et al.  Fault tolerant real-time global scheduling on multiprocessors , 1999, Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS'99.

[10]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[11]  Stephan Wong,et al.  A memcpy Hardware Accelerator Solution for Non Cache-line Aligned Copies , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[12]  Ben H. H. Juurlink,et al.  Leakage-Aware Multiprocessor Scheduling , 2009, J. Signal Process. Syst..

[13]  Xiaobo Sharon Hu,et al.  Task scheduling and voltage selection for energy minimization , 2002, DAC '02.

[14]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[15]  Luca Benini,et al.  Dynamic power management - design techniques and CAD tools , 1997 .

[16]  Ben H. H. Juurlink,et al.  Dynamic techniques to reduce memory traffic in embedded systems , 2004, CF '04.

[17]  Margaret Martonosi,et al.  Let caches decay: reducing leakage energy via exploitation of cache generational behavior , 2002, TOCS.

[18]  Ben H. H. Juurlink,et al.  Unified dual data caches , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[19]  Peter Petrov,et al.  Performance and power effectiveness in embedded processors customizable partitioned caches , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Babu R. Chalamala Portable Electronics and the Widening Energy Gap , 2007 .

[21]  m-boudour International Conference Proceedings , 2010 .

[22]  Mahmut T. Kandemir,et al.  Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework , 2003, IEEE Trans. Computers.

[23]  Anoop Gupta,et al.  Memory system performance of UNIX on CC-NUMA multiprocessors , 1995, SIGMETRICS '95/PERFORMANCE '95.

[24]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[25]  B. M. Gordon,et al.  Supply and threshold voltage scaling for low power CMOS , 1997, IEEE J. Solid State Circuits.

[26]  Ben H. H. Juurlink,et al.  Leakage-aware multiprocessor scheduling for low power , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[27]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[29]  Nikitas J. Dimopoulos,et al.  Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments , 2007, ISPA.

[30]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[31]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[32]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[33]  Laxmi N. Bhuyan,et al.  Hardware Support for Accelerating Data Movement in Server Platform , 2007, IEEE Transactions on Computers.

[34]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[35]  Glenn Reinman,et al.  Reducing energy and delay using efficient victim caches , 2003, ISLPED '03.

[36]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[37]  Krishnendu Chakrabarty,et al.  Pruning-based, energy-optimal, deterministic I/O device scheduling for hard real-time systems , 2005, TECS.

[38]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[39]  Radu Marculescu,et al.  Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization , 2003, ICCAD.

[40]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[41]  Dake Liu,et al.  Power consumption estimation in CMOS VLSI chips , 1994, IEEE J. Solid State Circuits.

[42]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[43]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[44]  Pong P. Chu,et al.  Write buffer design for on-chip cache , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[45]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[46]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[47]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[48]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[49]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[50]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[51]  Ben H. H. Juurlink,et al.  Memory copies in multi-level memory systems , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[52]  Niraj K. Jha,et al.  Low-power system scheduling, synthesis and displays , 2005 .

[53]  Stefanos Kaxiras,et al.  Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power , 2000, PACS.

[54]  Mahmut T. Kandemir,et al.  Power protocol: reducing power dissipation on off-chip data buses , 2002, MICRO.

[55]  Ben Juurlink,et al.  Reducing Conflict Misses in Caches , 2003 .

[56]  Ben H. H. Juurlink,et al.  Reducing traffic generated by conflict misses in caches , 2004, CF '04.

[57]  Ben H. H. Juurlink,et al.  Limiting the number of dirty cache lines , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[58]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[59]  Linwei Niu,et al.  Fixed priority scheduling for reducing overall energy on variable voltage processors , 2004, 25th IEEE International Real-Time Systems Symposium.

[60]  Karam S. Chatha,et al.  Automated techniques for energy efficient scheduling on homogeneous and heterogeneous chip multi-processor architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[61]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[62]  Petru Eles,et al.  Overhead-conscious voltage selection for dynamic and leakage energy reduction of time-constrained systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[63]  Mahmut T. Kandemir,et al.  Energy optimization techniques in cluster interconnects , 2003, ISLPED '03.

[64]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[65]  André Seznec,et al.  Minimizing Single-Usage Cache Pollution for Effective Cache Hierarchy Management , 2005 .

[66]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[67]  Mahmut Kandemir,et al.  Power protocol: reducing power dissipation on off-chip data buses , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[68]  Yann-Hang Lee,et al.  Scheduling techniques for reducing leakage power in hard real-time systems , 2003, 15th Euromicro Conference on Real-Time Systems, 2003. Proceedings..

[69]  Henk Corporaal,et al.  Intra-task scenario-aware voltage scheduling , 2005, CASES '05.

[70]  Mateo Valero,et al.  Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[71]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[72]  Trevor Mudge,et al.  Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads , 2002, ICCAD 2002.

[73]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[74]  Ben H. H. Juurlink,et al.  Trade-Offs Between Voltage Scaling and Processor Shutdown for Low-Energy Embedded Multiprocessors , 2007, SAMOS.

[75]  Trevor Pering,et al.  Dynamic Voltage Scaling and the Design of a Low-Power Microprocessor System , 1998 .

[76]  Niraj K. Jha,et al.  Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-time Embedded Systems , 2003, ICCAD 2003.

[77]  Tei-Wei Kuo,et al.  An approximation algorithm for energy-efficient scheduling on a chip multiprocessor , 2005, Design, Automation and Test in Europe.

[78]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[79]  Fred J. Pollack New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) , 1999, MICRO.

[80]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[81]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[82]  Gary S. Tyson,et al.  Improving cache performance via active management , 1999 .

[83]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[84]  Mahmut T. Kandemir,et al.  Analyzing data reuse for cache reconfiguration , 2005, TECS.

[85]  Narayanan Vijaykrishnan,et al.  Impact of technology scaling and packaging on dynamic voltage scaling techniques , 2002, 15th Annual IEEE International ASIC/SOC Conference.

[86]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[87]  Split Temporal / Spatial Cache : A Survey and Reevaluation of Performance 0 , 1999 .

[88]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[89]  Francky Catthoor Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems , 1999, J. VLSI Signal Process..

[90]  Ben Juurlink,et al.  Off-Chip Memory Traffic Measurements of Low-Power Embedded Systems , 2002 .

[91]  Linda Mui,et al.  Web Performance Tuning , 1998 .

[92]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[93]  Luca Benini,et al.  Energy-efficient design of battery-powered embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[94]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[95]  S. Rixner,et al.  Optimizing Kernel Block Memory Operations , 2006 .