Non-Concurrent Error Detection and Correction in Fault-Tolerant Discrete-Time LTI Dynamic Systems

This paper develops resource-efficient alternatives to modular redundancy for fault-tolerant discrete-time (DT) linear time-invariant (LTI) dynamic systems. The proposed method extends previous approaches that are based on embedding the state of a given DT LTI dynamic system into the redundant state-space of a DT LTI dynamic system of higher state dimension. These embeddings, as well as the embeddings studied in this paper, preserve the state evolution of the original system in some linearly encoded form and allow error detection and correction to be performed through concurrent parity checks (i.e., parity checks that are evaluated at the end of each time step). The novelty of the approach developed in this paper relies on carefully choosing the redundant dynamics of the fault-tolerant implementation in a way that allows parity checks to capture the evolution of errors in the system and, based on nonconcurrent parity checks (e.g., parity checks that are evaluated periodically), uniquely determine the initial value of each error, the time step at which it took place and the state variable it originally affected. The resulting error detection, identification, and correction procedures can be performed periodically and can significantly reduce the overhead, complexity and reliability requirements on the checking mechanism.

[1]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[2]  Christoforos N. Hadjicostis,et al.  Structured redundancy for fault tolerance in state-space models and Petri nets , 1999, Kybernetika.

[3]  R. Blahut Algebraic Codes for Data Transmission , 2002 .

[4]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[5]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[6]  Joachim Rosenthal,et al.  BCH convolutional codes , 1999, IEEE Trans. Inf. Theory.

[7]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[8]  Suku Nair,et al.  Real-Number Codes for Bault-Tolerant Matrix Operations On Processor Arrays , 1990, IEEE Trans. Computers.

[9]  Amber Roy-Chowdhury,et al.  Algorithm-Based Fault Location and Recovery for Matrix Computations on Multiprocessor Systems , 1996, IEEE Trans. Computers.

[10]  G. Robert Redinbo,et al.  Algorithm-Based Fault Tolerant Synthesis for Linear Operations , 1996, IEEE Trans. Computers.

[11]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[12]  Paul E. Beckmann,et al.  Fault-tolerant round robin A/D converter system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  J. Miller Numerical Analysis , 1966, Nature.

[14]  S. Liberty,et al.  Linear Systems , 2010, Scientific Parallel Computing.

[15]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[16]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[17]  John Riordan,et al.  Introduction to Combinatorial Analysis , 1959 .

[18]  John Riordan,et al.  Introduction to Combinatorial Analysis , 1958 .

[19]  Christoforos N. Hadjicostis,et al.  Coding approaches to fault tolerance in dynamic systems , 1999 .

[20]  J.A. Abraham,et al.  Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures , 1986, Proceedings of the IEEE.

[21]  Abhijit Chatterjee,et al.  The Design of Fault-Tolerant Linear Digital State Variable Systems: Theory and Techniques , 1993, IEEE Trans. Computers.

[22]  G. Robert Redinbo,et al.  Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation , 1998, IEEE Trans. Computers.

[23]  Bruce R. Musicus,et al.  Fast fault-tolerant digital convolution using a polynomial residue number system , 1993, IEEE Trans. Signal Process..

[24]  Christoforos N. Hadjicostis,et al.  Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems , 2001, The Kluwer international series in engineering and computer science.

[25]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[26]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[27]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[28]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[29]  F. Fairman Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.