Power-Efficient Computer Architectures: Recent Advances

As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.

[1]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[2]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[3]  Nick Barrow-Williams,et al.  Proximity coherence for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Ki Hwan Yum,et al.  Adaptive data compression for high-performance low-power on-chip networks , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[5]  David B. Whalley,et al.  Speculative tag access for reduced energy dissipation in set-associative L1 data caches , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[6]  David Eklov,et al.  Efficient software-based online phase classification , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[7]  William J. Dally,et al.  Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Wen-mei W. Hwu,et al.  Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications , 2010, International Journal of Parallel Programming.

[9]  Andreas Sembrant,et al.  Power-Sleuth: A Tool for Investigating Your Program's Power Behavior , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[11]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[12]  Li-Shiuan Peh,et al.  SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Balaram Sinharoy,et al.  IBM POWER7 multicore server processor , 2011 .

[14]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[15]  Yuval Peress,et al.  Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE) , 2009, LCTES '09.

[16]  Per Hammarlund,et al.  4th generation Intel™ Core processor, codenamed Haswell , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[17]  Alaa R. Alameldeen,et al.  Trading Off Cache Capacity for Low-Voltage Operation , 2009, IEEE Micro.

[18]  Ravi Iyengar,et al.  28nm high- metal-gate heterogeneous quad-core CPUs for high-performance and energy-efficient mobile application processor , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[19]  Shubhendu S. Mukherjee,et al.  Measuring Architectural Vulnerability Factors , 2003, IEEE Micro.

[20]  D. Blaauw,et al.  Opportunities and challenges for better than worst-case design , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[21]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[22]  David B. Whalley,et al.  Designing a practical data filter cache to improve both energy efficiency and performance , 2013, ACM Trans. Archit. Code Optim..

[23]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[24]  R. E. Kessler The Cavium 32 Core OCTEON II 68xx , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[25]  Sandhya Dwarkadas,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[26]  Andrew A. Chien,et al.  The future of microprocessors , 2011, Commun. ACM.

[27]  Trevor Mudge,et al.  Automatic Performance Setting for Dynamic Voltage Scaling , 2002 .

[28]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[30]  Sang-Hyun Oh Physics and technologies of vertical transistors , 2001 .

[31]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[32]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[34]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[35]  Dave Brown,et al.  Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .

[36]  José González,et al.  Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[37]  Varghese George,et al.  Power management of the third generation intel core micro architecture formerly codenamed ivy bridge , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[38]  Vikram Bhatt,et al.  GreenDroid: An architecture for the Dark Silicon Age , 2012, 17th Asia and South Pacific Design Automation Conference.

[39]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[40]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[41]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[42]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[43]  Marcelo Yuffe,et al.  A fully integrated multi-CPU, GPU and memory controller 32nm processor , 2011, 2011 IEEE International Solid-State Circuits Conference.

[44]  David Blaauw,et al.  Limits of Parallelism and Boosting in Dim Silicon , 2013, IEEE Micro.

[45]  Stijn Eyerman,et al.  A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.

[46]  C. Hu,et al.  FinFET-a self-aligned double-gate MOSFET scalable to 20 nm , 2000 .

[47]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[48]  Milos D. Ercegovac,et al.  The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[49]  Gu-Yeon Wei,et al.  The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.

[50]  Eric S. Chung,et al.  LINQits: big data on little clients , 2013, ISCA.

[51]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[52]  Luca P. Carloni,et al.  Networks-on-chip in emerging interconnect paradigms: Advantages and challenges , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[53]  Norman P. Jouppi,et al.  Designing, packaging, and testing a 300-MHz, 115 W ECL microprocessor , 1994, IEEE Micro.

[54]  Lin Zhong,et al.  Self-constructive high-rate system energy modeling for battery-powered mobile systems , 2011, MobiSys '11.

[55]  David B. Whalley,et al.  Reducing instruction fetch energy in multi-issue processors , 2013, ACM Trans. Archit. Code Optim..

[56]  David Blaauw,et al.  A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[57]  Margaret Martonosi,et al.  Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance , 2006, IEEE Micro.

[58]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[59]  Andreas Moshovos RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence , 2005, ISCA 2005.

[60]  Matthias A. Blumrich,et al.  Design and implementation of the blue gene/P snoop filter , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[61]  Frank Vahid,et al.  A Way-Halting Cache for Low-Energy High-Performance Systems , 2005, IEEE Computer Architecture Letters.

[62]  Naveen Verma,et al.  A Micro-Power EEG Acquisition SoC With Integrated Feature Extraction Processor for a Chronic Seizure Detection System , 2010, IEEE Journal of Solid-State Circuits.

[63]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[64]  Emil Talpes,et al.  Toward a multiple clock/voltage island design style for power-aware processors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[65]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[66]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[67]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[68]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[69]  Sharad Malik,et al.  Intraprogram dynamic voltage scaling: Bounding opportunities with analytic modeling , 2004, TACO.

[70]  Murali Annavaram,et al.  Mitigating Amdahl's Law through EPI Throttling , 2005, ISCA 2005.

[71]  Vijayalakshmi Srinivasan,et al.  A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[72]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[73]  Friedemann Mattern,et al.  From the Internet of Computers to the Internet of Things , 2010, From Active Data Management to Event-Based Systems and More.

[74]  James R. Larus,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[75]  Scott B. Baden,et al.  Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.

[76]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[77]  Natalie D. Enright Jerger,et al.  Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[78]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[79]  Stefanos Kaxiras,et al.  Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.

[80]  Niraj K. Jha,et al.  Energy efficiency of handheld computer interfaces: limits, characterization and practice , 2005, MobiSys '05.

[81]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[82]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[83]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[84]  Anantha Chandrakasan,et al.  Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI , 2012, DAC Design Automation Conference 2012.

[85]  Alan Gara,et al.  Improving the accuracy of snoop filtering using stream registers , 2007, MEDEA '07.

[86]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[87]  Babak Falsafi,et al.  TurboTag: Lookup filtering to reduce coherence directory power , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[88]  Uday Bondhugula,et al.  Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.

[89]  John Sartori,et al.  Designing a processor from the ground up to allow voltage/reliability tradeoffs , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[90]  Robert J. Wood,et al.  Hardware in the loop for optical flow sensing in a robotic bee , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[91]  Luca P. Carloni,et al.  Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors , 2008, IEEE Transactions on Computers.

[92]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[93]  Aneesh Aggarwal,et al.  Cache Noise Prediction , 2008, IEEE Transactions on Computers.

[94]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[95]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[96]  Li-Shiuan Peh,et al.  Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[97]  Peter Marwedel,et al.  Cache-aware scratchpad allocation algorithm , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[98]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[99]  William J. Dally,et al.  A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[100]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[101]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[102]  Peter Marwedel,et al.  Dynamic overlay of scratchpad memory for energy minimization , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[103]  Sharad Malik,et al.  EPROF: An energy/performance/reliability optimization framework for streaming applications , 2012, 17th Asia and South Pacific Design Automation Conference.

[104]  Gu-Yeon Wei,et al.  Shrink-Fit: A Framework for Flexible Accelerator Sizing , 2013, IEEE Computer Architecture Letters.

[105]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[106]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[107]  Michael C. Huang,et al.  The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[108]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[109]  Hui Feng,et al.  Compiler-directed scratchpad memory management via graph coloring , 2009, TACO.

[110]  Sally A. McKee,et al.  Portable, scalable, per-core power estimation for intelligent resource management , 2010, International Conference on Green Computing.

[111]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[112]  Jason Cong,et al.  CMP network-on-chip overlaid with multi-band RF-interconnect , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[113]  Margaret Martonosi,et al.  The XTREM power and performance simulator for the Intel XScale core: Design and experiences , 2007, TECS.

[114]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[115]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[116]  David A. Wood,et al.  WiDGET: Wisconsin decoupled grid execution tiles , 2010, ISCA.

[117]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[118]  P. Boyle,et al.  A 300-MHz 115-W 32-b bipolar ECL microprocessor , 1993 .

[119]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[120]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[121]  Martin Schulz,et al.  Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[122]  Naehyuck Chang,et al.  Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[123]  Ming Zhang,et al.  Where is the energy spent inside my app?: fine grained energy accounting on smartphones with Eprof , 2012, EuroSys '12.

[124]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[125]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[126]  David Blaauw,et al.  Swizzle-Switch Networks for Many-Core Systems , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[127]  Rami G. Melhem,et al.  Energy aware scheduling for distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[128]  Joseph A. Paradiso,et al.  Energy scavenging for mobile and wireless electronics , 2005, IEEE Pervasive Computing.

[129]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[130]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[131]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.