The Evolution of Fault Tolerant Computing at the University of Illinois

The University of Illinois has been active in research in the fault-tolerant computing field for over 25 years. Fundamental ideas have been proposed and major contributions made by researchers at the University of Illinois in the areas of testing and diagnosis, concurrent error detection, and fault tolerance. This paper traces the origins of these ideas and their development within the University of Illinois, as well as their influence upon research at other institutions, and outlines current directions of research.

[1]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[2]  Jacob A. Abraham,et al.  Design of Testable Structures Defined by Simple Loops , 1981, IEEE Transactions on Computers.

[3]  G. Metze,et al.  Fault diagnosis of digital systems , 1970 .

[4]  Jacob A. Abraham,et al.  Fault-Tolerant Matrix Operations On Multiple Processor Systems Using Weighted Checksums , 1984, Optics & Photonics.

[5]  Janak H. Patel,et al.  Memory System Design for Tolerating Single Event Upsets , 1983, IEEE Transactions on Nuclear Science.

[6]  Jacob A. Abraham,et al.  A Multivalued Algebra For Modeling Physical Failures in MOS VLSI Circuits , 1985, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Sundaram Seshu,et al.  On an Improved Diagnosis Program , 1965, IEEE Trans. Electron. Comput..

[8]  J. Abraham An Improved Algorithm for Network Reliability , 1979, IEEE Transactions on Reliability.

[9]  Jean Andre Dussault On the Design of Self-Checking Systems under Various Fault Models. , 1977 .

[10]  Janak H. Patel,et al.  A Minimum Test Set for Multiple Fault Detection in Ripple Carry Adders , 1987, IEEE Transactions on Computers.

[11]  Niraj K. Jha,et al.  Design of Testable CMOS Logic Circuits Under Arbitrary Delays , 1985, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Janak H. Patel,et al.  Multiple-Fault Detection in Iterative Logic Arrays , 1985, ITC.

[13]  James E. Smith On Necessary and Sufficient Conditions for Multiple Fault Undetectability , 1979, IEEE Transactions on Computers.

[14]  Arthur D. Friedman,et al.  System-Level Fault Diagnosis , 1980, Computer.

[15]  Jacob A. Abraham,et al.  Self-Test for Microprocessors , 1985, ITC.

[16]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[17]  Jacob A. Abraham,et al.  TESTING OF SEMICONDUCTOR RANDOM ACCESS MEMORIES. , 1977 .

[18]  G. Metze,et al.  On the Existence of Combinational Networks with Arbitrary Multiple Redundancies , 1975 .

[19]  Prithviraj Banerjee,et al.  Fault-secure algorithms for multiple-processor systems , 1984, ISCA 1984.

[20]  Jacob A. Abraham,et al.  Test Generation for Microprocessors , 1980, IEEE Transactions on Computers.

[21]  Jacob A. Abraham,et al.  Concurrent error detection in VLSI interconnection networks , 1983, ISCA '83.

[22]  Jacob A. Abraham,et al.  MOS FAULT SIMULATOR WITH TIMING INFORMATION. , 1985 .

[23]  Gernot Metze,et al.  Fault Detection Capabilities of Alternating Logic , 1978, IEEE Transactions on Computers.

[24]  Gernot Metze,et al.  A New Representation for Faults in Combinational Digital Circuits , 1972, IEEE Transactions on Computers.

[25]  T.C.K. Chou,et al.  Performance/Availability Model of Shared Resource Multiprocessors , 1980, IEEE Transactions on Reliability.

[26]  Bernard Courtois,et al.  Strongly language disjoint checkers , 1985 .

[27]  Sundaram Seshu,et al.  The Diagnosis of Asynchronous Sequential Switching Systems , 1962, IRE Trans. Electron. Comput..

[28]  Ralph Allan Marlett,et al.  On the design and testing of self-diagnosable computers. , 1966 .

[29]  Kozo Kinoshita,et al.  A Design of Programmable Logic Arrays with Universal Tests , 1981, IEEE Transactions on Computers.

[30]  Jacob A. Abraham,et al.  Functional Testing of Microprocessors , 1984, IEEE Transactions on Computers.

[31]  John P. Hayes,et al.  A Nand Model ror Fault Diagnosis in Combinational Logic Networks , 1971, IEEE Transactions on Computers.

[32]  Janak H. Patel,et al.  Concurrent Error Detection in Multiply and Divide Arrays , 1983, IEEE Transactions on Computers.

[33]  Miroslaw Malek,et al.  A Fault-Tolerant FFT Processor , 1988, IEEE Trans. Computers.

[34]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[35]  Eric George Manning On Computer Self-Diagnosis Part II-Generalizations and Design Principles , 1966, IEEE Trans. Electron. Comput..

[36]  EDWARD J. McCLUSKEY,et al.  Fault Equivalence in Combinational Logic Networks , 1971, IEEE Transactions on Computers.

[37]  Niraj K. Jha,et al.  DESIGN OF TOTALLY SELF-CHECKING EMBEDDED CHECKERS. , 1984 .

[38]  Ram Chillarege,et al.  The effect of system workload on error latency: an experimental study , 1985, SIGMETRICS 1985.

[39]  Jacob A. Abraham,et al.  Characterization and Testing of Physical Failures in MOS Logic Circuits , 1984, IEEE Design & Test of Computers.

[40]  R. E. Meagher,et al.  The ORDVAC , 1951, AIEE-IRE '51.

[41]  Sundaram Seshu The Logic Organizer and Diagnosis Programs , 1964 .

[42]  William C. Carter,et al.  Design of dynamically checked computers , 1968, IFIP Congress.

[43]  Hsi Ching Shih,et al.  TESTING OF MOS VLSI CIRCUITS. , 1985 .

[44]  Ravishankar K. Iyer,et al.  Fault latency in the memory - An experimental study on VAX 11/780 , 1986 .

[45]  Niraj K. Jha,et al.  TECHNIQUES FOR EFFICIENT MOS IMPLEMENTATION OF TOTALLY SELF-CHECKING CHECKERS. , 1985 .

[46]  James E. Robertson,et al.  Diagnostic Programs for the Illiac , 1953, Proceedings of the IRE.

[47]  Ravishankar K. Iyer,et al.  Effect of System Workload on Operating System Reliability: A Study on IBM 3081 , 1985, IEEE Transactions on Software Engineering.

[48]  Jacob A. Abraham,et al.  CHIEFS : A Concurrent, Hierarchical and Extensible Fault Simulator , 1985, ITC.

[49]  James E. Smith,et al.  Strongly Fault Secure Logic Networks , 1978, IEEE Transactions on Computers.

[50]  Gernot Metze,et al.  Design of Totally Self-Checking Check Circuits for m-Out-of-n Codes , 1973, IEEE Transactions on Computers.

[51]  Charles W. Cha Multiple Fault Diagnosis in Combinational Networks , 1979, 16th Design Automation Conference.

[52]  Jacob A. Abraham,et al.  DESIGN OF A MICROPROGRAM CONTROL UNIT WITH CONCURRENT ERROR DETECTION. , 1983 .

[53]  Melvin A. Breuer,et al.  Roving Emulation as a Fault Detection Mechanism , 1986, IEEE Transactions on Computers.

[54]  Jacob A. Abraham,et al.  Test Generation for Programmable Logic Arrays , 1982, DAC 1982.

[55]  Jacob A. Abraham,et al.  High level hierarchical fault simulation techniques , 1985, CSC '85.

[56]  Donald Ralph Schertz,et al.  On the Representation of Digital Faults , 1969 .

[57]  James Edward Smith The design of totally self-checking combinational circuits. , 1976 .

[58]  Sudhakar M. Reddy,et al.  A March Test for Functional Faults in Semiconductor Random Access Memories , 1981, IEEE Transactions on Computers.

[59]  Franklin T. Luk Algorithm-based Fault Tolerance for Parallel Matrix Equation Solvers , 1986, Optics & Photonics.

[60]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[61]  Kilin To Fault Folding for Irredundant and Redundant Combinational Circuits , 1973, IEEE Transactions on Computers.

[62]  Hideo Fujiwara,et al.  A Design of Programmable Logic Arrays with Universal Tests , 1981, IEEE Transactions on Computers.

[63]  Kien A. Hua,et al.  Built-In Tests for VLSI Finite-State Machines , 1984 .

[64]  Jacob A. Abraham,et al.  Efficient Algorithms for Testing Semiconductor Random-Access Memories , 1978, IEEE Transactions on Computers.

[65]  Gerald M. Masson,et al.  An 0(n2.5) Fault Identification Algorithm for Diagnosable Systems , 1984, IEEE Transactions on Computers.

[66]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[67]  Jacob A. Abraham,et al.  Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems , 1986, IEEE Transactions on Computers.

[68]  Jacob A. Abraham,et al.  Load Redistribution Under Failure in Distributed Systems , 1983, IEEE Transactions on Computers.