Coding approaches to fault tolerance in linear dynamic systems

This paper discusses fault tolerance in discrete-time dynamic systems, such as finite-state controllers or computer simulations, with focus on the use of coding techniques to efficiently provide fault tolerance to linear finite-state machines (LFSMs). Unlike traditional fault tolerance schemes, which rely heavily-particularly for dynamic systems operating over extended time horizons-on the assumption that the error-correcting mechanism is fault free, we are interested in the case when all components of the implementation are fault prone. The paper starts with a paradigmatic fault tolerance scheme that systematically adds redundancy into a discrete-time dynamic system in a way that achieves tolerance to transient faults in both the state transition and the error-correcting mechanisms. By combining this methodology with low-complexity error-correcting coding, we then obtain an efficient way of providing fault tolerance to k identical unreliable LFSMs that operate in parallel on distinct input sequences. The overall construction requires only a constant amount of redundant hardware per machine (but sufficiently large k) to achieve an arbitrarily small probability of overall failure for any prespecified (finite) time interval, leading in this way to a lower bound on the computational capacity of unreliable LFSMs.

[1]  W. W. Peterson,et al.  Error-Correcting Codes. , 1962 .

[2]  Bruce R. Musicus,et al.  Fault-tolerant computation using algebraic homomorphisms , 1992 .

[3]  Bapiraju Vinnakota,et al.  Synthesis of Algorithm-Based Fault-Tolerant Systems from Dependence Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  Amber Roy-Chowdhury,et al.  Algorithm-Based Fault Location and Recovery for Matrix Computations on Multiprocessor Systems , 1996, IEEE Trans. Computers.

[5]  Michael A. Harrison,et al.  Lectures on linear sequential machines , 1969 .

[6]  Christoforos N. Hadjicostis,et al.  Nonconcurrent error detection and correction in fault-tolerant linear finite-state machines , 2003, IEEE Trans. Autom. Control..

[7]  Paul E. Beckmann,et al.  Fault-tolerant round robin A/D converter system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Nicholas Pippenger,et al.  On networks of noisy gates , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[9]  Eiji Fujiwara,et al.  Error-control coding for computer systems , 1989 .

[10]  Irving S. Reed,et al.  Coding Techniques for Failure- Tolerant Counters , 1970, IEEE Transactions on Computers.

[11]  Suku Nair,et al.  Real-Number Codes for Bault-Tolerant Matrix Operations On Processor Arrays , 1990, IEEE Trans. Computers.

[12]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[13]  Michael G. Taylor Reliable computation in computing systems designed from unreliable components , 1968 .

[14]  Christoforos N. Hadjicostis,et al.  Non-concurrent error detection and correction in discrete-time LTI dynamic systems , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[15]  Algirdas Avizienis Fault tolerance by means of external monitoring of computer systems , 1981, AFIPS '81.

[16]  Christoforos N. Hadjicostis,et al.  Finite-state machine embeddings for nonconcurrent error detection and identification , 2005, IEEE Transactions on Automatic Control.

[17]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[18]  A. J. Rainal First and second passage times of sine wave plus noise , 1968 .

[19]  Yoshihiro Tohma,et al.  Failure-Tolerant Sequential Machines with Past Information , 1971, IEEE Transactions on Computers.

[20]  Nicholas Pippenger,et al.  Reliable computation by formulas in the presence of noise , 1988, IEEE Trans. Inf. Theory.

[21]  Nicholas Pippenger Reliable Computation in the Presence of Noise , 1986 .

[22]  Jon C. Muzio,et al.  Analysis of One-Dimensional Linear Hybrid Cellular Automata over GF(q) , 1996, IEEE Trans. Computers.

[23]  T. Williams,et al.  Aliasing errors in linear automata used as multiple-input signature analyzers , 1990 .

[24]  Christoforos N. Hadjicostis,et al.  Fault-tolerant computation in groups and semigroups: applications to automata, dynamic systems and Petri nets , 2002, J. Frankl. Inst..

[25]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[26]  Christer Svensson,et al.  Noise in digital dynamic CMOS circuits , 1994 .

[27]  Christoforos N. Hadjicostis,et al.  Finite-state machine embeddings for non-concurrent error detection and identification , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[28]  Tomás Feder,et al.  Reliable computation by networks in the presence of noise , 1989, IEEE Trans. Inf. Theory.

[29]  Daniel A. Spielman,et al.  Expander codes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[30]  Régis Leveugle,et al.  Optimized Synthesis of Concurrently Checked Controllers , 1990, IEEE Trans. Computers.

[31]  John F. Wakerly,et al.  Error detecting codes, self-checking circuits and applications , 1978 .

[32]  Thijs Krol (N, K) Concept Fault Tolerance , 1986, IEEE Transactions on Computers.

[33]  Péter Gács,et al.  Reliable computation with cellular automata , 1983, J. Comput. Syst. Sci..

[34]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[35]  Christoforos N. Hadjicostis,et al.  Structured redundancy for fault tolerance in state-space models and Petri nets , 1999, Kybernetika.

[36]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[37]  Snider,et al.  Digital logic gate using quantum-Dot cellular automata , 1999, Science.

[38]  Christoforos N. Hadjicostis,et al.  Encoded dynamics for fault tolerance in linear finite-state machines , 2002, IEEE Trans. Autom. Control..

[39]  G. Robert Redinbo,et al.  Algorithm-Based Fault Tolerant Synthesis for Linear Operations , 1996, IEEE Trans. Computers.

[40]  Shyh-Jye Jou,et al.  Structural approach for performance driven ECC circuit synthesis , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[41]  F. Lemmermeyer Error-correcting Codes , 2005 .

[42]  Daniel A. Spielman,et al.  Highly fault-tolerant parallel computation , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[43]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[44]  G. R. Redinbo,et al.  Probability of State Transition Errors in a Finite State Machine Containing Soft Failures , 1984, IEEE Transactions on Computers.

[45]  D. C. Cooper,et al.  Sequential Machines and Automata Theory , 1968, Comput. J..

[46]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[47]  Michael G. Taylor Reliable information storage in memories designed from unreliable components , 1968 .

[48]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[49]  Leonard J. Schulman,et al.  Signal propagation and noisy circuits , 1999, IEEE Trans. Inf. Theory.

[50]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[51]  Irving S. Reed,et al.  Redundancy by Coding Versus Redundancy by Replication for Failure-Tolerant Sequential Circuits , 1972, IEEE Transactions on Computers.

[52]  Larry L. Kinney,et al.  Concurrent Fault Detection in Microprogrammed Control Units , 1985, IEEE Transactions on Computers.

[53]  Larry L. Kinney,et al.  Concurrent Error Detection in Sequential Circuits Using Convolutional Codes , 1991, AAECC.

[54]  Christoforos N. Hadjicostis,et al.  Periodic and non-concurrent error detection and identification in one-hot encoded FSMs , 2004, Autom..

[55]  Rubin A. Parekhji,et al.  Concurrent error detection using monitoring machines , 1995, IEEE Design & Test of Computers.

[56]  Nicholas Pippenger,et al.  Developments in "The synthesis of reliable organ-isms from unreliable components , 1990 .

[57]  C. Hadjicostis NON-CONCURRENT ERROR DETECTION AND CORRECTION IN FAULT-TOLERANT LINEAR FINITE-STATE MACHINES , 2002 .

[58]  John Paul Shen,et al.  Direct methods for synthesis of self-monitoring state machines , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[59]  Christoforos N. Hadjicostis,et al.  Fault-tolerant computation in semigroups and semirings , 1995 .

[60]  Christoforos N. Hadjicostis,et al.  Coding approaches to fault tolerance in dynamic systems , 1999 .

[61]  Paul E. Beckmann,et al.  A group-theoretic framework for fault-tolerant computation , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Piero Olivo,et al.  Analysis and Design of Linear Finite State Machines for Signature Analysis Testing , 1991, IEEE Trans. Computers.

[63]  Anna Gál,et al.  Lower bounds for the complexity of reliable Boolean circuits with noisy gates , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[64]  Robert L. Martin,et al.  Studies in Feedback Shift Register Synthesis of Sequential Machines , 1969 .

[65]  Bruce R. Musicus,et al.  Fast fault-tolerant digital convolution using a polynomial residue number system , 1993, IEEE Trans. Signal Process..

[66]  Christoforos N. Hadjicostis,et al.  Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems , 2001, The Kluwer international series in engineering and computer science.

[67]  Thammavarapu R. N. Rao,et al.  Error coding for arithmetic processors , 1974 .

[68]  Larry L. Kinney,et al.  Concurrent Error Detection for Restricted Fault Sets in Sequential Circuits and Microprogrammed Control Units Using Convolutional Codes , 1991 .

[69]  Péter Gács,et al.  Lower bounds for the complexity of reliable Boolean circuits with noisy gates , 1994, IEEE Trans. Inf. Theory.

[70]  William S. Evans,et al.  On the Maximum Tolerable Noise for Reliable Computation by Formulas , 1998, IEEE Trans. Inf. Theory.

[71]  Rubin A. Parekhji,et al.  A Methodology for Designing Optimal Self-Checking Sequential Circuits , 1991, 1991, Proceedings. International Test Conference.

[72]  Kenneth L. Shepard,et al.  Noise in deep submicron digital design , 1996, Proceedings of International Conference on Computer Aided Design.

[73]  Norbert Wehn,et al.  The Hyeti Defect Tolerant Microprocessor: A Practical Experiment and its Cost-Effectiveness Analysis , 1994, IEEE Trans. Computers.

[74]  Hu Chuan-Gan,et al.  On The Shift Register Sequences , 2004 .

[75]  Rüdiger L. Urbanke,et al.  Efficient encoding of low-density parity-check codes , 2001, IEEE Trans. Inf. Theory.

[76]  Parimal Pal Chaudhuri,et al.  Theory and Application of Nongroup Cellular Automata for Synthesis of Easily Testable Finite State Machines , 1996, IEEE Trans. Computers.

[77]  Larry L. Kinney,et al.  CONCURRENT ERROR DETECTION FOR RESTRICTED FAULT SETS IN SEQUENTIAL CIRCUITS AND MICROPROGRAMMED CONT , 1991, 1991, Proceedings. International Test Conference.

[78]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[79]  R. Blahut Algebraic Codes for Data Transmission , 2002 .

[80]  Bernard P. Zeigler Every Discrete Input Machine is Linearly Simulatable , 1973, J. Comput. Syst. Sci..

[81]  Christoforos N. Hadjicostis,et al.  Non-Concurrent Error Detection and Correction in Fault-Tolerant Discrete-Time LTI Dynamic Systems , 2003 .

[82]  J.A. Abraham,et al.  Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures , 1986, Proceedings of the IEEE.