Embedded RAIDs-on-chip for bus-based chip-multiprocessors

The dual effects of larger die sizes and technology scaling, combined with aggressive voltage scaling for power reduction, increase the error rates for on-chip memories. Traditional on-chip memory reliability techniques (e.g., ECC) incur significant power and performance overheads. In this article, we propose a low-power-and-performance-overhead Embedded RAID (E-RAID) strategy and present Embedded RAIDs-on-Chip (E-RoC), a distributed dynamically managed reliable memory subsystem for bus-based Chip-Multiprocessors. E-RoC achieves reliability through redundancy by optimizing RAID-like policies tuned for on-chip distributed memories. We achieve on-chip reliability of memories through the use of Distributed Dynamic ScratchPad Allocatable Memories (DSPAMs) and their allocation policies. We exploit aggressive voltage scaling to reduce power consumption overheads due to parallel DSPAM accesses, and rely on the E-RoC Manager to automatically handle any resulting voltage-scaling-induced errors. We demonstrate how E-RAIDs can further enhance the fault tolerance of traditional memory reliability approaches by designing E-RAID levels that exploit ECC. Finally, we show the power and flexibility of the E-RoC concept by showing the benefits of having a heterogeneous E-RAID levels that fit each application's needs (fault tolerance, power/energy, performance). Our experimental results on CHStone/Mediabench II benchmarks show that our E-RAID levels converge to 100% error-free data rates much faster than traditional ECC approaches. Moreover, E-RAID levels that exploit ECC can guarantee 99.9% error-free data rates at ultra low Vdd on average, where as traditional ECC approaches were able to attain at most 99.1% error-free data rates. We observe an average of 22% dynamic power consumption increase by using traditional ECC approaches with respect to the baseline (non-voltage scaled SPMs), whereas our E-RAID levels are able to save dynamic power consumption by an average of 27% (w.r.t. the same non-voltage scaled SPMs baseline), while incurring worst-case 2% higher performance overheads than traditional ECC approaches. By voltage scaling the memories, we see that traditional ECC approaches are able to save static energy by 6.4% (average), where as our E-RAID approaches achieve 23.4% static energy savings (average). Finally, we observe that mixing E-RAID levels allows us to further reduce the dynamic power consumption by up to 55.5% at the cost of an average 5.6% increase in execution time over traditional approaches.

[1]  Nikil D. Dutt,et al.  Towards Embedded RAIDs-on-Chip , 2011 .

[2]  Mihaela van der Schaar,et al.  Software adaptation in quality sensitive applications to deal with hardware variability , 2010, GLSVLSI '10.

[3]  Aviral Shrivastava,et al.  Dynamic code mapping for limited local memory systems , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[4]  Nikil D. Dutt,et al.  Formal performance evaluation of AMBA-based system-on-chip designs , 2006, EMSOFT '06.

[5]  Robert J. T. Morris,et al.  The evolution of storage systems , 2003, IBM Syst. J..

[6]  Nikil D. Dutt,et al.  E-RoC: Embedded RAIDs-on-Chip for low power distributed dynamically managed reliable memories , 2011, 2011 Design, Automation & Test in Europe.

[7]  Hiroyuki Tomiyama,et al.  CHStone: A benchmark program suite for practical C-based high-level synthesis , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[8]  Howard Leo Kalter,et al.  A 50-ns 16-Mb DRAM with a 10-ns data rate and on-chip ECC , 1990 .

[9]  Anna W. Topol,et al.  Stable SRAM cell design for the 32 nm node and beyond , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Technology, 2005..

[10]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Nikil D. Dutt,et al.  A Multi-Granularity Power Modeling Methodology for Embedded Processors , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Mohamed Shalan,et al.  A dynamic memory management unit for embedded real-time system-on-a-chip , 2000, CASES '00.

[13]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[14]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[15]  K. Ishibashi,et al.  16.7 fA/cell tunnel-leakage-suppressed 16 Mb SRAM for handling cosmic-ray-induced multi-errors , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[16]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[17]  Nur A. Touba,et al.  Reducing power consumption in memory ECC checkers , 2004, 2004 International Conferce on Test.

[18]  Soontae Kim Area-Efficient Error Protection for Caches , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[19]  Mahmut T. Kandemir,et al.  Improving scratch-pad memory reliability through compiler-guided data block duplication , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[20]  Nikil D. Dutt,et al.  A Methodology for Power-aware Pipelining via High-Level Performance Model Evaluations , 2009, 2009 10th International Workshop on Microprocessor Test and Verification.

[21]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[22]  Nikil D. Dutt,et al.  HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.

[23]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[24]  Nikil D. Dutt,et al.  FFT-Cache: A Flexible Fault-Tolerant Cache architecture for ultra low voltage operation , 2011, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES).

[25]  Heonshik Shin,et al.  Dynamic scratchpad memory management for code in portable systems with an MMU , 2008, TECS.

[26]  N. Okumura,et al.  A 600 MHz single-chip multiprocessor with 4.8 GB/s internal shared pipelined bus and 512 kB internal memory , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[27]  Nikil D. Dutt,et al.  COSMECA: Application Specific Co-Synthesis of Memory and Communication Architectures for MPSoC , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[28]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[29]  Narayanan Vijaykrishnan,et al.  Working with Process Variation Aware Caches , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[30]  Avesta Sasan,et al.  A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache) , 2009, CASES '09.

[31]  Kaushik Roy,et al.  A 160 mV, fully differential, robust schmitt trigger based sub-threshold SRAM , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[32]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[33]  A.P. Chandrakasan,et al.  A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation , 2007, IEEE Journal of Solid-State Circuits.

[34]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[35]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[36]  Edward J. McCluskey,et al.  PADded cache: a new fault-tolerance technique for cache memories , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[37]  Farshad Moradi,et al.  65NM sub-threshold 11T-SRAM for ultra low voltage applications , 2008, 2008 IEEE International SOC Conference.

[38]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[39]  Ahmed M. Eltawil,et al.  Low-Power Multimedia System Design by Aggressive Voltage Scaling , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  Nikil D. Dutt,et al.  Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications , 2009, 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia.

[41]  Aviral Shrivastava,et al.  Heap data management for limited local memory (LLM) multi-core processors , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[42]  Luca Benini,et al.  Reliability Support for On-Chip Memories Using Networks-on-Chip , 2006, 2006 International Conference on Computer Design.

[43]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[44]  Nikil D. Dutt,et al.  CAPPS: A Framework for Power–Performance Tradeoffs in Bus-Matrix-Based On-Chip Communication Architecture Synthesis , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[45]  M. A. Lucente,et al.  Memory system reliability improvement through associative cache redundancy , 1990, IEEE Proceedings of the Custom Integrated Circuits Conference.

[46]  Sang Lyul Min,et al.  Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU , 2010, IEEE Transactions on Computers.

[47]  Hiroaki Takada,et al.  Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[48]  Nikil D. Dutt,et al.  Fast exploration of bus-based on-chip communication architectures , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[49]  Amin Ansari,et al.  Enabling ultra low voltage system operation by tolerating on-chip cache failures , 2009, ISLPED.

[50]  Sudeep Pasricha Transaction level modeling of SoC with SystemC 2.0 , 2004 .

[51]  Wei Zhang,et al.  Enhancing data cache reliability by the addition of a small fully-associative replication cache , 2004, ICS '04.

[52]  Hiroaki Takada,et al.  Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems , 2010, CASES '10.

[53]  Bede Liu,et al.  Understanding multimedia application characteristics for designing programmable media processors , 1998, Electronic Imaging.

[54]  Amin Ansari,et al.  ZerehCache: Armoring cache architectures in high defect density technologies , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[55]  Haridimos T. Vergos,et al.  Efficient fault tolerant cache memory design , 1995, Microprocess. Microprogramming.

[56]  Chaitali Chakrabarti,et al.  Energy-aware error control coding for Flash memories , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[57]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[58]  Puneet Gupta,et al.  Variation-aware speed binning of multi-core processors , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[59]  Peter Marwedel,et al.  Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications , 2007, SCOPES '07.

[60]  M. Sachdev,et al.  A multiword based high speed ECC scheme for low-voltage embedded SRAMS , 2008, ESSCIRC 2008 - 34th European Solid-State Circuits Conference.

[61]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[62]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[63]  Tulika Mitra,et al.  Scratchpad allocation for concurrent embedded software , 2010, TOPL.

[64]  Nikil D. Dutt,et al.  E < MC2: less energy through multi-copy cache , 2010, CASES '10.

[65]  Subramanian Ramaswamy,et al.  Improving cache efficiency via resizing + remapping , 2007, 2007 25th International Conference on Computer Design.

[66]  Puneet Gupta,et al.  A case for opportunistic embedded sensing in presence of hardware power variability , 2010 .

[67]  Avesta Sasan,et al.  Limits on voltage scaling for caches utilizing fault tolerant techniques , 2007, 2007 25th International Conference on Computer Design.

[68]  Rouwaida Kanj,et al.  Cross Layer Error Exploitation for Aggressive Voltage Scaling , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[69]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[70]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[71]  Avesta Sasan,et al.  Process Variation Aware SRAM/Cache for aggressive voltage-frequency scaling , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.