Variability-aware memory management for nanoscale computing

As the semiconductor industry continues to push the limits of sub-micron technology, the ITRS expects hardware (e.g., die-to-die, wafer-to-wafer, and chip-to-chip) variations to continue increasing over the next few decades. As a result, it is imperative for designers to build variation-aware software stacks that may adapt and opportunistically exploit said variations to increase system performance/responsiveness as well as minimize power consumption. The memory subsystem is one of the largest components in today's computing system, a main contributor to the overall power consumption of the system, and therefore one of the most vulnerable components to the effects of variations (e.g., power). This paper discusses the concept of variability-aware memory management for nanoscale computing systems. We show how to opportunistically exploit the hardware variations in on-chip and off-chip memory at the system level through the deployment of variation-aware software stacks.

[1]  Karthick Rajamani,et al.  Benchmarking for Power and Performance , 2007 .

[2]  Mihaela van der Schaar,et al.  Software adaptation in quality sensitive applications to deal with hardware variability , 2010, GLSVLSI '10.

[3]  Taewhan Kim,et al.  Memory access scheduling and binding considering energy minimization in multi-bank memory systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[4]  Puneet Gupta,et al.  VaMV: Variability-aware Memory Virtualization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[7]  Xuanyao Fong,et al.  Spin-Transfer Torque MRAMs for Low Power Memories: Perspective and Prospective , 2012, IEEE Sensors Journal.

[8]  Puneet Gupta,et al.  Trading Accuracy for Power in a Multiplier Architecture , 2011, J. Low Power Electron..

[9]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[10]  Rakesh Kumar,et al.  A numerical optimization-based methodology for application robustification: Transforming applications for error tolerance , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[11]  Kartik Mohanram,et al.  Approximate logic circuits for low overhead, non-intrusive concurrent error detection , 2008, 2008 Design, Automation and Test in Europe.

[12]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[13]  Narayanan Vijaykrishnan,et al.  Variation-aware task allocation and scheduling for MPSoC , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[14]  Avesta Sasan,et al.  Process Variation Aware SRAM/Cache for aggressive voltage-frequency scaling , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Karthick Rajamani,et al.  A performance-conserving approach for reducing peak power consumption in server systems , 2005, ICS '05.

[16]  Kathryn S. McKinley,et al.  Cooperative caching with keep-me and evict-me , 2005, 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05).

[17]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[18]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[19]  Puneet Gupta,et al.  A case for opportunistic embedded sensing in presence of hardware power variability , 2010 .

[20]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[21]  Chen Ding,et al.  P-OPT: Program-Directed Optimal Cache Management , 2008, LCPC.

[22]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[23]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[24]  Mark S. K. Lau,et al.  Energy-aware probabilistic multiplier: design and analysis , 2009, CASES '09.

[25]  Xiaowei Li,et al.  Variation-Aware Scheduling for Chip Multiprocessors with Thread Level Redundancy , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[26]  David Blaauw,et al.  Making typical silicon matter with Razor , 2004, Computer.

[27]  John Sartori,et al.  Designing a processor from the ground up to allow voltage/reliability tradeoffs , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[28]  Xiaobo Sharon Hu,et al.  Power aware variable partitioning and instruction scheduling for multiple memory banks , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[29]  Mahmut T. Kandemir Impact of data transformations on memory bank locality , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[30]  Georg Georgakos,et al.  Soft Error Rates in 65nm SRAMs--Analysis of new Phenomena , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[31]  Sani R. Nassif,et al.  High Performance CMOS Variability in the 65nm Regime and Beyond , 2006, 2007 IEEE International Electron Devices Meeting.

[32]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[33]  Melvin A. Breuer,et al.  Defect and error tolerance in the presence of massive numbers of defects , 2004, IEEE Design & Test of Computers.

[34]  Mahmut T. Kandemir,et al.  Scheduler-based DRAM energy management , 2002, DAC '02.

[35]  Subhasish Mitra,et al.  Overcoming Early-Life Failure and Aging Challenges for Robust System Design , 2013 .

[36]  Puneet Gupta,et al.  Variability-aware duty cycle scheduling in long running embedded sensing systems , 2011, 2011 Design, Automation & Test in Europe.

[37]  B. Granbom,et al.  Soft error rate increase for new generations of SRAMs , 2003 .

[38]  Puneet Gupta,et al.  Hardware Variability-Aware Duty Cycling for Embedded Sensors , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[40]  Krishna V. Palem,et al.  Energy aware computing through probabilistic switching: a study of limits , 2005, IEEE Transactions on Computers.

[41]  Sandeep K. Gupta,et al.  Approximate logic synthesis for error tolerant applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[42]  John Sartori,et al.  Slack redistribution for graceful degradation under voltage overscaling , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[43]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[44]  Nikil D. Dutt,et al.  E-RoC: Embedded RAIDs-on-Chip for low power distributed dynamically managed reliable memories , 2011, 2011 Design, Automation & Test in Europe.

[45]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[46]  Alan J. Weger,et al.  Thermal-aware task scheduling at the system software level , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[47]  Puneet Gupta,et al.  ViPZonE: OS-level memory variability-driven physical address zoning for energy savings , 2012, CODES+ISSS '12.

[48]  Yifeng Zhu,et al.  Evaluating memory energy efficiency in parallel I/O workloads , 2007, 2007 IEEE International Conference on Cluster Computing.

[49]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[50]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[51]  Meeta Sharma Gupta,et al.  Software-assisted hardware reliability: Abstracting circuit-level challenges to the software stack , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[52]  Sani R. Nassif,et al.  Modeling and analysis of manufacturing variations , 2001, Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169).

[53]  Sarita V. Adve,et al.  Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[54]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[55]  Narayanan Vijaykrishnan,et al.  Working with Process Variation Aware Caches , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[56]  Sani R. Nassif,et al.  Power variability and its impact on design , 2005, 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design.

[57]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[58]  Puneet Gupta,et al.  Variation-aware speed binning of multi-core processors , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[59]  Ahmed M. Eltawil,et al.  Low-Power Multimedia System Design by Aggressive Voltage Scaling , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[60]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[61]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[62]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[63]  Hiroaki Takada,et al.  Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[64]  Puneet Gupta,et al.  DDRO: A novel performance monitoring methodology based on design-dependent ring oscillators , 2012, Thirteenth International Symposium on Quality Electronic Design (ISQED).

[65]  Puneet Gupta,et al.  Power Variability in Contemporary DRAMs , 2012, IEEE Embedded Systems Letters.

[66]  Mihaela van der Schaar,et al.  AppAdapt: Opportunistic Application Adaptation in Presence of Hardware Variation , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.