Improving the fault resilience of an H.264 decoder using static analysis methods

Fault tolerance rapidly evolves into one of the most significant design objectives for embedded systems due to reduced semiconductor structures and supply voltages. However, resource-constrained systems cannot afford traditional error correction for overhead and cost reasons. New methods are required to sustain acceptable service quality in case of errors while avoiding crashes. We present a flexible fault-tolerance approach that is able to select correction actions depending on error semantics using application annotations and static analysis approaches. We verify the validity of our approach by analyzing the vulnerability and improving the reliability of an H.264 decoder using flexible error handling.

[1]  Jean Arlat,et al.  Workshop on Dependable and Secure Nanocomputing , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[2]  Shane Markstrum,et al.  Semantic type qualifiers , 2005, PLDI '05.

[3]  Alexander Aiken,et al.  A theory of type qualifiers , 1999, PLDI '99.

[4]  Alan Burns,et al.  Analysis of Checkpointing for Real-Time Systems , 2004, Real-Time Systems.

[5]  Irith Pomeranz,et al.  No Free Lunch in Soft Error Protection , 2008 .

[6]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[7]  Subhasish Mitra,et al.  Cross-layer resilience challenges: Metrics and optimization , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[8]  Alfredo BENSO,et al.  A software development kit for dependable applications in embedded systems , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[9]  Bernd Becker,et al.  Low-Cost Hardening of Image Processing Applications Against Soft Errors , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[10]  Dhiraj K. Pradhan,et al.  Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture , 1994, IEEE Trans. Computers.

[11]  Michael Engel,et al.  Improving transient memory fault resilience of an H.264 decoder , 2010, 2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia.

[12]  Diana Franklin,et al.  Efficient fault tolerance in multi-media applications through selective instruction replication , 2008, WREFT '08.

[13]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[14]  Byung Kook Kim,et al.  Checkpointing strategy for multiple real-time tasks , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[15]  David Walker,et al.  Fault-tolerant typed assembly language , 2007, PLDI '07.

[16]  Sergio Montenegro,et al.  Application-level fault tolerance in real-time embedded systems , 2008, 2008 International Symposium on Industrial Embedded Systems.

[17]  Michael Engel,et al.  Unreliable yet useful - reliability annotations for data in cyber-physical systems , 2011, GI-Jahrestagung.

[18]  Donald Yeung,et al.  Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[19]  Ying Zhang,et al.  Fault recovery based on checkpointing for hard real-time embedded systems , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[20]  Nicholas P. Carter,et al.  Design techniques for cross-layer resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[21]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[22]  Melhem Mejia-Alvarez,et al.  FAULT-TOLERANT ADAPTIVE SCHEDULING FOR EMBEDDED REAL-TIME SYSTEMS , 2001 .

[23]  W. Kent Fuchs,et al.  Compiler‐assisted full checkpointing , 1994, Softw. Pract. Exp..

[24]  Todd M. Austin,et al.  Exploiting selective placement for low-cost memory protection , 2008, TACO.

[25]  Patrice Chalin,et al.  Towards Support for Non-null Types and Non-null-by-default in Java , 2006 .

[26]  Nuno Brito,et al.  Aspect-oriented fault tolerance for real-time embedded systems , 2008 .

[27]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.