DeSyRe: On-demand system reliability

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect-/fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints.

[1]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[2]  K. Doya,et al.  Electrophysiological properties of inferior olive neurons: A compartmental model. , 1999, Journal of neurophysiology.

[3]  Scott A. Mahlke,et al.  Reliable Systems on Unreliable Fabrics , 2008, IEEE Design & Test of Computers.

[4]  Dhiraj K. Pradhan,et al.  Software Modification Aided Transient Error Tolerance for Embedded Systems , 2013, 2013 Euromicro Conference on Digital System Design.

[5]  J. Tschanz,et al.  Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-/spl mu/m to 90-nm generation , 2003, IEEE International Electron Devices Meeting 2003.

[6]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[7]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[8]  C. Chia,et al.  Glucose sensors: toward closed loop insulin delivery. , 2004, Endocrinology and metabolism clinics of North America.

[9]  Roman Hovorka,et al.  Simulation Environment to Evaluate Closed-Loop Insulin Delivery Systems in Type 1 Diabetes , 2010, Journal of diabetes science and technology.

[10]  P Atanasov,et al.  Implantation of a refillable glucose monitoring-telemetry device. , 1997, Biosensors & bioelectronics.

[11]  Rishad A. Shafik,et al.  System-level design optimization of reliable and low power multiprocessor system-on-chip , 2012, Microelectron. Reliab..

[12]  Jari Nurmi,et al.  CRISP: Cutting Edge Reconfigurable ICs for Stream Processing , 2011 .

[13]  Christos Strydis,et al.  Architecture-level fault-tolerance for biomedical implants , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[14]  Ran Ginosar,et al.  QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[15]  Pascal Theodoor Wolkotte,et al.  Exploration within the Network-on-Chip Paradigm , 2009 .

[16]  Chris I. De Zeeuw,et al.  Climbing Fiber Burst Size and Olivary Sub-threshold Oscillations in a Network Setting , 2012, PLoS Comput. Biol..

[17]  Robert H. Lee,et al.  An FPGA-based approach to high-speed simulation of conductance-based neuron models , 2007, Neuroinformatics.

[18]  M. Shults,et al.  A telemetry-instrumentation system for monitoring multiple subcutaneously implanted glucose sensors , 1994, IEEE Transactions on Biomedical Engineering.

[19]  Gerard J. M. Smit,et al.  Multicore soc for on-board payload signal processing , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[20]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[22]  Pantelis Georgiou,et al.  VHDL implementation of the Biostator II glucose control algorithm for critical care , 2011, 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS).

[23]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[24]  Massimo Violante,et al.  An hybrid architecture to detect transient faults in microprocessors: An experimental validation , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Michel Dubois,et al.  CPPC: Correctable parity protected cache , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).