Finite-state machine embeddings for non-concurrent error detection and identification

In digital sequential systems that operate over several time steps, a state-transition fault at any time step during the operation of the system corrupts its state in a way that can render its future functionality useless. In this paper, we develop a methodology for systematically constructing redundant finite-state machines so that an external checker can capture transient state-transition faults via checks that are performed in a non-concurrent manner (e.g., periodically). More specifically, the proposed approach allows the checker to detect and identify errors due to past state-transition faults based on an analysis of the current, possibly corrupted FSM state. As a result, the checker in such designs can operate at a slower speed than the rest of the system which relaxes the stringent requirements on its reliability.

[1]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[2]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[3]  Janette Cardoso,et al.  Monitoring manufacturing systems by means of Petri nets with imprecise markings , 1989, Proceedings. IEEE International Symposium on Intelligent Control 1989.

[4]  Thammavarapu R. N. Rao,et al.  Error coding for arithmetic processors , 1974 .

[5]  Christoforos N. Hadjicostis,et al.  Fault-tolerant computation in groups and semigroups: applications to automata, dynamic systems and Petri nets , 2002, J. Frankl. Inst..

[6]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[7]  R. Blahut Algebraic Codes for Data Transmission , 2002 .

[8]  W.M. Wonham,et al.  Fault diagnosis in timed discrete-event systems , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[9]  Larry L. Kinney,et al.  Concurrent Error Detection in Sequential Circuits Using Convolutional Codes , 1991, AAECC.

[10]  C. Hadjicostis NON-CONCURRENT ERROR DETECTION AND CORRECTION IN FAULT-TOLERANT LINEAR FINITE-STATE MACHINES , 2002 .

[11]  G. Robert Redinbo,et al.  Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation , 1998, IEEE Trans. Computers.

[12]  Albert Benveniste,et al.  Diagnosis of asynchronous discrete-event systems: a net unfolding approach , 2003, IEEE Trans. Autom. Control..

[13]  Demosthenis Teneketzis,et al.  Active diagnosis of discrete-event systems , 1998 .

[14]  Christoforos N. Hadjicostis,et al.  Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems , 2001, The Kluwer international series in engineering and computer science.

[15]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[16]  Christoforos N. Hadjicostis,et al.  Structured redundancy for fault tolerance in state-space models and Petri nets , 1999, Kybernetika.

[17]  Christoforos N. Hadjicostis,et al.  Fault-tolerant computation in semigroups and semirings , 1995 .

[18]  Irving S. Reed,et al.  Coding Techniques for Failure- Tolerant Counters , 1970, IEEE Transactions on Computers.

[19]  Christoforos N. Hadjicostis,et al.  Monitoring Discrete Event Systems Using Petri Net Embeddings , 1999, ICATPN.

[20]  Lawrence E. Holloway,et al.  Template languages for fault monitoring of timed discrete event processes , 2000, IEEE Trans. Autom. Control..

[21]  Joachim Rosenthal,et al.  BCH convolutional codes , 1999, IEEE Trans. Inf. Theory.

[22]  Raja Sengupta,et al.  Diagnosability of discrete-event systems , 1995, IEEE Trans. Autom. Control..

[23]  V. S. Srinivasan,et al.  Fault detection/monitoring using time Petri nets , 1993, IEEE Trans. Syst. Man Cybern..

[24]  Stéphane Lafortune,et al.  On the Effect of Communication Delays in Failure Diagnosis of Decentralized Discrete Event Systems , 2003, Discret. Event Dyn. Syst..

[25]  G. R. Redinbo,et al.  Probability of State Transition Errors in a Finite State Machine Containing Soft Failures , 1984, IEEE Transactions on Computers.

[26]  Larry L. Kinney,et al.  Concurrent Fault Detection in Microprogrammed Control Units , 1985, IEEE Transactions on Computers.

[27]  A. Sengupta,et al.  Realization of Fault-Tolerant Machines—Linear Code Application , 1981, IEEE Transactions on Computers.

[28]  Mischa Schwartz,et al.  Simple finite-state fault detectors for communication networks , 1992, IEEE Trans. Commun..

[29]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[30]  Alessandro Giua,et al.  Observability of place/transition nets , 2002, IEEE Trans. Autom. Control..

[31]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[32]  Irving S. Reed,et al.  Redundancy by Coding Versus Redundancy by Replication for Failure-Tolerant Sequential Circuits , 1972, IEEE Transactions on Computers.

[33]  C. Desclaux,et al.  Supervisory control of discrete-event processes with partial observations , 1988 .

[34]  J. Massey,et al.  Codes, automata, and continuous systems: Explicit interconnections , 1967, IEEE Transactions on Automatic Control.

[35]  G. Robert Redinbo,et al.  Finite Field Fault-Tolerant Digital Filtering Architectures , 1987, IEEE Transactions on Computers.

[36]  Christoforos N. Hadjicostis,et al.  Non-concurrent error detection and correction in discrete-time LTI dynamic systems , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[37]  Bruce R. Musicus,et al.  Fault-tolerant computation using algebraic homomorphisms , 1992 .

[38]  Amber Roy-Chowdhury,et al.  Algorithm-Based Fault Location and Recovery for Matrix Computations on Multiprocessor Systems , 1996, IEEE Trans. Computers.

[39]  Christoforos N. Hadjicostis,et al.  Nonconcurrent error detection and correction in fault-tolerant linear finite-state machines , 2003, IEEE Trans. Autom. Control..

[40]  Joachim Rosenthal,et al.  Codes, systems, and graphical models , 2001 .

[41]  J. M. Schumacher,et al.  On the relationship between algebraic systems theory and coding theory: representations of codes , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[42]  Stéphane Lafortune,et al.  Decentralized supervisory control with communicating controllers , 2000, IEEE Trans. Autom. Control..

[43]  Christoforos N. Hadjicostis,et al.  Non-Concurrent Error Detection and Correction in Fault-Tolerant Discrete-Time LTI Dynamic Systems , 2003 .

[44]  Régis Leveugle,et al.  Optimized Synthesis of Concurrently Checked Controllers , 1990, IEEE Trans. Computers.

[45]  Christoforos N. Hadjicostis,et al.  Encoded dynamics for fault tolerance in linear finite-state machines , 2002, IEEE Trans. Autom. Control..

[46]  G. Robert Redinbo,et al.  Algorithm-Based Fault Tolerant Synthesis for Linear Operations , 1996, IEEE Trans. Computers.