DarkMem: Fine-grained power management of local memories for accelerators in embedded systems

SRAM consumes a growing fraction of the static power in heterogeneous SoCs, as embedded memories take 70% to 90% of the area of specialized accelerators. We present Dark-Mem as a comprehensive solution for fine-grained power management of accelerator local memories. The DarkMem methodology optimizes at design time the bank configuration for each given accelerator to maximize power-gating opportunities. The DarkMem microarchitecture dynamically varies the operating mode of each memory bank according to the accelerator workload. In our experiments, DarkMem reduces the SRAM static power by more than 40% on average, which translates into a reduction of the total power by almost 18% on average with less than 1% overhead.

[1]  Kevin J. Nowka,et al.  Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[3]  Luca P. Carloni,et al.  Invited: The case for Embedded Scalable Platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[4]  Gu-Yeon Wei,et al.  The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.

[5]  Jan M. Rabaey,et al.  Standby supply voltage minimization for deep sub-micron SRAM , 2005, Microelectron. J..

[6]  Jonathan Chang,et al.  Power reduction techniques for an 8-core xeon® processor , 2009, 2009 Proceedings of ESSCIRC.

[7]  Stephen Richardson,et al.  Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era , 2016, IEEE Design & Test.

[8]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[9]  Yan Meng,et al.  On the limits of leakage power reduction in caches , 2005, 11th International Symposium on High-Performance Computer Architecture.

[10]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[11]  Sergio Bampi,et al.  Adaptive power management of on-chip video memory for Multiview Video Coding , 2012, DAC Design Automation Conference 2012.

[12]  Jason Cong,et al.  Architecture support for accelerator-rich CMPs , 2012, DAC Design Automation Conference 2012.

[13]  Yue Wang,et al.  Run-time power-gating in caches of GPUs for leakage energy savings , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Saturnino Garcia,et al.  CortexSuite: A synthetic brain benchmark suite , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[15]  Anantha Chandrakasan,et al.  Challenges and Directions for Low-Voltage SRAM , 2011, IEEE Design & Test of Computers.

[16]  Luca P. Carloni,et al.  From Latency-Insensitive Design to Communication-Based System-Level Design , 2015, Proceedings of the IEEE.

[17]  Luca P. Carloni,et al.  An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Luca P. Carloni,et al.  System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[20]  Luca Benini,et al.  An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[21]  Luca P. Carloni,et al.  An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).