Towards Embedded RAIDs-on-Chip

The dual effects of larger die sizes and technology scaling, combined with aggressive voltage scaling for power reduction, increase the error rates for on-chip memories. Traditional on-chip memory reliability techniques (e.g., ECC) incur significant power and performance overheads. In this paper, we propose a low-power-and-performance-overhead Embedded RAID (E-RAID) strategy and present Embedded RAIDs-on-Chip (E-RoC), a distributed dynamically managed reliable memory subsystem. E-RoC achieves reliability through redundancy by optimizing RAID-like policies tuned for on-chip distributed memories. We achieve on-chip reliability of memories through the use of Distributed Dynamic ScratchPad Allocatable Memories (DSPAMs) and their allocation policies. We exploit aggressive voltage scaling to reduce power consumption overheads due to parallel DSPAM accesses, and rely on the E-RoC manager to automatically handle any resulting voltage-scaling-induced errors. We demonstrate how E-RAIDs can further enhance the fault tolerance of traditional memory reliability approaches by designing E-RAID levels that exploit ECC. Finally, we show the power and flexibility of the E-RoC concept by showing the benefits of having a heterogeneous E-RAID levels that fit each application's needs (fault tolerance, power/energy, performance). Our experimental results on multimedia benchmarks show that E-RoC's fully distributed redundant reliable memory subsystem can reduce up to 85% in dynamic power consumption, and up to 61% lower latency due to error checks/corrections. On average, we see that our E-RAID levels converge to 100% Yield much faster than traditional ECC approaches. Moreover, E-RAID levels that exploit ECC (e.g., E-RAID ECC + 1, E-RAID RP + ECC) can guarantee 99.9% Yield at ultra low Vdd on average, where as SECDED and DECTED were able to attain 99.1% and 99.4% Yield respectively. Our E-RAID levels (detection and correction) achieved a worst case 93.9% Yield, where as the traditional ECC approaches achieved a worst case of 34.1% Yield. We observe an average of 22% dynamic power consumption increase by using traditional ECC approaches (EDC1, EDC8, SEC, SECDED, DEC, DECTED), where as we observe average savings of ECC, E-RAID TMR). We see that on average traditional ECC approaches are able to save static energy by 6.4%, where as our E-RAID approaches achieve 23.4% static energy savings. We observe that on average TMR) incur 2% higher overheads than traditional ECC approaches (EDC1, EDC8, SEC, SECDED, DEC, DECTED). We observe that for Vdd above 0.45, on average, our E-RAID levels with error correction support (SEC) incur 3% lower overheads over the more traditional SECDED/DECTED schemes. Finally, we observe that mixing …

[1]  Erik Brockmeyer,et al.  Data reuse analysis technique for software-controlled memory hierarchies , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[2]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[3]  Sani R. Nassif,et al.  Modeling and analysis of manufacturing variations , 2001, Proceedings of the IEEE 2001 Custom Integrated Circuits Conference (Cat. No.01CH37169).

[4]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5]  Wei Zhang,et al.  Enhancing data cache reliability by the addition of a small fully-associative replication cache , 2004, ICS '04.

[6]  Yunheung Paek,et al.  Compiler driven data layout optimization for regular/irregular array access patterns , 2008, LCTES '08.

[7]  Sang Lyul Min,et al.  Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU , 2010, IEEE Transactions on Computers.

[8]  Haridimos T. Vergos,et al.  Efficient fault tolerant cache memory design , 1995, Microprocess. Microprogramming.

[9]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[10]  Ahmed M. Eltawil,et al.  Low-Power Multimedia System Design by Aggressive Voltage Scaling , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Soontae Kim Area-Efficient Error Protection for Caches , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  Avesta Sasan,et al.  Process Variation Aware SRAM/Cache for aggressive voltage-frequency scaling , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[13]  Howard Leo Kalter,et al.  A 50-ns 16-Mb DRAM with a 10-ns data rate and on-chip ECC , 1990 .

[14]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[15]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  Hiroaki Takada,et al.  Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  M. Sachdev,et al.  A multiword based high speed ECC scheme for low-voltage embedded SRAMS , 2008, ESSCIRC 2008 - 34th European Solid-State Circuits Conference.

[18]  Peter Marwedel,et al.  Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications , 2007, SCOPES '07.

[19]  Reducing power consumption in memory ECC checkers , 2004, 2004 International Conferce on Test.

[20]  Georg Georgakos,et al.  Soft Error Rates in 65nm SRAMs--Analysis of new Phenomena , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[21]  Subramanian Ramaswamy,et al.  Improving cache efficiency via resizing + remapping , 2007, 2007 25th International Conference on Computer Design.

[22]  Avesta Sasan,et al.  Limits on voltage scaling for caches utilizing fault tolerant techniques , 2007, 2007 25th International Conference on Computer Design.

[23]  Hiroaki Takada,et al.  Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems , 2010, CASES '10.

[24]  Chaitali Chakrabarti,et al.  Energy-aware error control coding for Flash memories , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[25]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[26]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[27]  M. A. Lucente,et al.  Memory system reliability improvement through associative cache redundancy , 1990, IEEE Proceedings of the Custom Integrated Circuits Conference.

[28]  Luca Benini,et al.  Reliability Support for On-Chip Memories Using Networks-on-Chip , 2006, 2006 International Conference on Computer Design.

[29]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[30]  Nikil D. Dutt,et al.  E-RoC: Embedded RAIDs-on-Chip for low power distributed dynamically managed reliable memories , 2011, 2011 Design, Automation & Test in Europe.

[31]  Tulika Mitra,et al.  Scratchpad allocation for concurrent embedded software , 2010, TOPL.

[32]  Robert J. T. Morris,et al.  The evolution of storage systems , 2003, IBM Syst. J..

[33]  Aviral Shrivastava,et al.  Heap data management for limited local memory (LLM) multi-core processors , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[34]  Avesta Sasan,et al.  A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache) , 2009, CASES '09.

[35]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[36]  Heonshik Shin,et al.  Dynamic scratchpad memory management for code in portable systems with an MMU , 2008, TECS.

[37]  Mohamed Shalan,et al.  A dynamic memory management unit for embedded real-time system-on-a-chip , 2000, CASES '00.

[38]  Rouwaida Kanj,et al.  Cross Layer Error Exploitation for Aggressive Voltage Scaling , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[39]  Nikil D. Dutt,et al.  E < MC2: less energy through multi-copy cache , 2010, CASES '10.

[40]  Aviral Shrivastava,et al.  Dynamic code mapping for limited local memory systems , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[41]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[42]  Max B Aron The single-chip cloud computer , 2010 .

[43]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[44]  Georges Gielen Proceedings of the conference on Design, automation and test in Europe: Proceedings , 2006 .

[45]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[46]  Nikil D. Dutt,et al.  A framework for memory-aware multimedia application mapping on chip-multiprocessors , 2008, 2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia.

[47]  Mahmut T. Kandemir,et al.  Improving scratch-pad memory reliability through compiler-guided data block duplication , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[48]  Hiroyuki Tomiyama,et al.  CHStone: A benchmark program suite for practical C-based high-level synthesis , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[49]  Nikil D. Dutt,et al.  Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications , 2009, 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia.

[50]  Nikil D. Dutt,et al.  A Methodology for Power-aware Pipelining via High-Level Performance Model Evaluations , 2009, 2009 10th International Workshop on Microprocessor Test and Verification.