AN-Encoding Compiler: Building Safety-Critical Systems with Commodity Hardware

In the future, we expect commodity hardware to be used in safety-critical applications. However, in the future commodity hardware is expected to become less reliable and more susceptible to soft errors because of decreasing feature size and reduced power supply. Thus, software-implemented approaches to deal with unreliable hardware will be needed. To simplify the handling of value failures, we provide failure virtualization in the sense that we transform arbitrary value failures caused by erroneous execution into fail-stop failures. The latter ones are easier to handle. Therefore, we use the arithmetic AN-code because it provides very good error detection capabilities. Arithmetic codes are suitable for the protection of commodity hardware because guarantees can be provided independent of the executing hardware. This paper presents the encoding compiler EC-AN which applies AN-encoding to arbitrary programs. According to our knowledge, this is the first in software implemented complete AN-encoding. Former encoding compilers either encode only small parts of applications or trade-off safety to enable complete AN-encoding.

[1]  Ute Schiffel,et al.  Software Protection Mechanisms for Dependable Systems , 2008, 2008 Design, Automation and Test in Europe.

[2]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[3]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[4]  David García,et al.  NonStop/spl reg/ advanced architecture , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[5]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[6]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[7]  Jacob A. Abraham,et al.  CEDA: control-flow error detection through assertions , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[8]  Algirdas Avizienis,et al.  Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital System Design , 1971, IEEE Transactions on Computers.

[9]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[10]  Subhasish Mitra Globally Optimized Robust Systems to Overcome Scaled CMOS Reliability Challenges , 2008, 2008 Design, Automation and Test in Europe.

[11]  Manuel Blum,et al.  Software reliability via run-time result-checking , 1997, JACM.

[12]  Martín Abadi,et al.  Architectural support for software-based protection , 2006, ASID '06.

[13]  John P. Hayes,et al.  Low-cost on-line fault detection using control flow assertions , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[14]  David I. August,et al.  Configurable Transient Fault Detection via Dynamic Binary Translation , 2006 .

[15]  Massimo Violante,et al.  Software and Hardware Techniques for SEU Detection in IP Processors , 2008, J. Electron. Test..

[16]  Cheng Wang,et al.  Software-based transparent and comprehensive control-flow error detection , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[17]  N. Hengartner,et al.  Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer , 2005, IEEE Transactions on Device and Materials Reliability.

[18]  S. Bagchi,et al.  Design and Evaluation of Preemptive Control Signature ( PECOS ) Checking , 2003 .

[19]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[20]  Karthik Pattabiraman,et al.  Samurai: protecting critical data in unsafe languages , 2008, Eurosys '08.

[21]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[22]  Konstantinos G. Margaritis,et al.  Algorithm Based Fault Tolerance : Review and experimental study , 2004 .

[23]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[25]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[26]  C. Fetzer,et al.  Hardware Failure Virtualization Via Software Encoded Processing , 2007, 2007 5th IEEE International Conference on Industrial Informatics.

[27]  P. Forin,et al.  VITAL CODED MICROPROCESSOR PRINCIPLES AND APPLICATION FOR VARIOUS TRANSIT SYSTEMS , 1990 .

[28]  Tim Wescott PID Without a PhD , 2009 .

[29]  Luigi Carro,et al.  Hardware and Software Transparency in the Protection of Programs Against SEUs and SETs , 2008, J. Electron. Test..

[30]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[31]  Jean-Luc Gaudiot,et al.  A Compiler-Assisted On-Chip Assigned-Signature Control Flow Checking , 2004, Asia-Pacific Computer Systems Architecture Conference.

[32]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[33]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[34]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[35]  David Clark,et al.  Safety and Security Analysis of Object-Oriented Models , 2002, SAFECOMP.

[36]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[37]  Marco Torchiano,et al.  A source-to-source compiler for generating dependable software , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[38]  Cheng Wang,et al.  Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[39]  Christof Fetzer,et al.  Software Encoded Processing: Building Dependable Systems with Commodity Hardware , 2007, SAFECOMP.

[40]  Eric Rotenberg,et al.  Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).