Cross-layer early reliability evaluation: Challenges and promises

Evaluation of computing systems reliability must be accurate enough to provide hints for the required fault protection mechanisms that will guarantee correctness of operation at acceptance costs. To be useful, reliability evaluation must be performed early enough in the design cycle when, however, the available details of the system are largely unknown. This inherent contradiction in terms: early vs. accurate, requires a cross-layer approach for reliability evaluation. Different layers of abstraction contribute differently in the overall system reliability; if this contribution can be assessed independently, the reliability of the system can be evaluated at the early stages of the design. We review the state-of-the-art in the area and discuss corresponding challenges .

[1]  Yu Cheng,et al.  Accurate vulnerability estimation for cache hierarchy , 2011, The 7th International Conference on Networked Computing and Advanced Information Management.

[2]  David Blaauw,et al.  Statistical timing analysis for intra-die process variations with spatial correlations , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[3]  Pradip Bose,et al.  A Framework for Architecture-Level Lifetime Reliability Modeling , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[4]  Giorgio Di Natale,et al.  Cross-Layer Early Reliability Evaluation for the Computing cOntinuum , 2014, 2014 17th Euromicro Conference on Digital System Design.

[5]  Michael F. P. O'Boyle,et al.  Evaluating the Effects of Compiler Optimisations on AVF , 2008 .

[6]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[7]  Yu Cheng,et al.  Accurate and Simplified Prediction of AVF for Delay and Energy Efficient Cache Design , 2011, Journal of Computer Science and Technology.

[8]  Alfredo Benso,et al.  Static analysis of SEU effects on software applications , 2002, Proceedings. International Test Conference.

[9]  Gurindar S. Sohi,et al.  Dynamic dead-instruction detection and elimination , 2002, ASPLOS X.

[10]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[11]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[12]  Sule Ozev,et al.  Quantifying the Impact of Process Variability on Microprocessor Behavior , 2006 .

[13]  Frederic T. Chong,et al.  Characterization of Error-Tolerant Applications when Protecting Control Data , 2006, 2006 IEEE International Symposium on Workload Characterization.

[14]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[15]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[16]  Alfredo Benso,et al.  Statistical Reliability Estimation of Microprocessor-Based Systems , 2012, IEEE Transactions on Computers.

[17]  David Blaauw,et al.  Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlations , 2003, ICCAD 2003.

[18]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[19]  Mehdi Baradaran Tahoori,et al.  A Field Analysis of System-level Effects of Soft Errors Occurring in Microprocessors used in Information Systems , 2008, 2008 IEEE International Test Conference.

[20]  Alan Messer,et al.  Susceptibility of commodity systems and software to memory soft errors , 2004, IEEE Transactions on Computers.

[21]  R. Smith Statistics of Extremes, with Applications in Environment, Insurance, and Finance , 2003 .

[22]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[23]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[24]  Shubhendu S. Mukherjee,et al.  Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[25]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[26]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[27]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[28]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[29]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[30]  Massimo Violante,et al.  Accurate and efficient analysis of single event transients in VLSI circuits , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[31]  Bin Li,et al.  Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[32]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[33]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[34]  Laxmi N. Bhuyan,et al.  High-performance computer architecture , 1995, Future Gener. Comput. Syst..

[35]  Ravishankar K. Iyer,et al.  An experimental study of soft errors in microprocessors , 2005, IEEE Micro.

[36]  Sarita V. Adve,et al.  SWAT : An Error Resilient System , 2008 .

[37]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[38]  Shubhendu S. Mukherjee,et al.  APast Future Time Quantized AVF : A Means of Capturing Vulnerability Variations over Small Windows of Time , 2009 .

[39]  Donald Yeung,et al.  Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[40]  David R. Kaeli,et al.  The Effect of Input Data on Program Vulnerability , 2009 .

[41]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[42]  C. Metra,et al.  A model for transient fault propagation in combinatorial logic , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[43]  Israel Koren,et al.  Techniques for transient fault sensitivity analysis and reduction in VLSI circuits , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[44]  Qiang Xu,et al.  AgeSim: A simulation framework for evaluating the lifetime reliability of processor-based SoCs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[45]  Giovanni De Micheli,et al.  Power and Reliability Management of SoCs , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[46]  Yu Hu,et al.  IVF: Characterizing the vulnerability of microprocessor structures to intermittent faults , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[47]  Bruce R. Childers,et al.  StealthWorks: Emulating Memory Errors , 2010, RV.