The evolution of dependable computing at the University of Illinois

The University of Illinois has been active in research in the dependable computing field for over 50 years. Fundamental ideas have been proposed and major contributions made by researchers at the University of Illinois in the areas of error detection and recovery, fault tolerance middleware, testing and diagnosis, experimental evaluation and benchmarking of system dependability, dependability modeling, and secure system design and validation. This paper traces the origins of these ideas and their development within the University of Illinois, as well as their influence upon research at other institutions, and outlines current research directions.

[1]  David Wright,et al.  Towards Operational Measures of Computer Security , 1993, J. Comput. Secur..

[2]  Eric G. Wagner On Connecting Modules Together Uniformly to Form a Modular Computer , 1966, IEEE Trans. Electron. Comput..

[3]  Janak H. Patel,et al.  A case study on the implementation of the Illinois Scan Architecture , 2001, Proceedings International Test Conference 2001 (Cat. No.01CH37260).

[4]  Elizabeth M. Rudnick,et al.  Diagnostic test generation for sequential circuits , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[5]  R. E. Meagher,et al.  The ORDVAC , 1951, AIEE-IRE '51.

[6]  William H. Sanders,et al.  Construction and solution of performability models based on stochastic activity networks , 1988 .

[7]  Janak H. Patel,et al.  A Minimum Test Set for Multiple Fault Detection in Ripple Carry Adders , 1987, IEEE Transactions on Computers.

[8]  W. Kent Fuchs,et al.  Efficient Spare Allocation for Reconfigurable Arrays , 1987 .

[9]  James E. Smith On Necessary and Sufficient Conditions for Multiple Fault Undetectability , 1979, IEEE Transactions on Computers.

[10]  Wen-mei W. Hwu,et al.  Center for Reliable and High-Performance Computing APPLICATION OF COMPILER-ASSISTED MULTIPLE INSTRUCTION ROLLBACK RECOVERY TO SPECULATIVE EXECUTION , 2017 .

[11]  Joost-Pieter Katoen,et al.  Lectures on Formal Methods and PerformanceAnalysis , 2001, Lecture Notes in Computer Science.

[12]  J. Lowry An initial foray into understanding adversary planning and courses of action , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[13]  Paliath Narendran,et al.  Formal Verification of the , 1988 .

[14]  Jacob A. Abraham,et al.  Characterization and Testing of Physical Failures in MOS Logic Circuits , 1984, IEEE Design & Test of Computers.

[15]  William C. Carter,et al.  Design of dynamically checked computers , 1968, IFIP Congress.

[16]  Robert A. van de Geijn,et al.  Fault-tolerant high-performance matrix multiplication: theory and practice , 2001, 2001 International Conference on Dependable Systems and Networks.

[17]  Ernst G. Ulrich,et al.  Concurrent simulation of nearly identical digital networks , 1973, Computer.

[18]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[19]  Jacob A. Abraham,et al.  Design of Testable Structures Defined by Simple Loops , 1981, IEEE Transactions on Computers.

[20]  William H. Sanders,et al.  Dependability and Performance Evaluation of Intrusion-Tolerant Server Architectures , 2003, LADC.

[21]  Ravishankar K. Iyer,et al.  NFTAPE: networked fault tolerance and performance evaluator , 2002, Proceedings International Conference on Dependable Systems and Networks.

[22]  William H. Sanders,et al.  Formal Verification of an IntrusionTolerant Group Membership Protocol , 2003 .

[23]  Ravishankar K. Iyer,et al.  Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[24]  G. Metze,et al.  On the Existence of Combinational Networks with Arbitrary Multiple Redundancies , 1975 .

[25]  Ravishankar K. Iyer,et al.  DEPEND: A Simulation-Based Environment for System Level Dependability Analysis , 1997, IEEE Trans. Computers.

[26]  Ravishankar K. Iyer,et al.  NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[27]  W. Kent Fuchs,et al.  Optimal message log reclamation for independent checkpointing , 1993 .

[28]  Sundaram Seshu The Logic Organizer and Diagnosis Programs , 1964 .

[29]  Enrico Macii,et al.  Multiple fault diagnosis in combinational networks , 1994, Proceedings of 1994 37th Midwest Symposium on Circuits and Systems.

[30]  Luai Mohammed Malhis,et al.  Development and application of an efficient method for the solution of stochastic activity networks with deterministic activities , 1996 .

[31]  W. Kent Fuchs,et al.  Efficient Spare Allocation in Reconfigurable Arrays , 1986, 23rd ACM/IEEE Design Automation Conference.

[32]  Ravishankar K. Iyer,et al.  Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors , 2004, International Conference on Dependable Systems and Networks, 2004.

[33]  Bernard Courtois,et al.  Strongly language disjoint checkers , 1985 .

[34]  Muhammad Akber Qureshi,et al.  Construction and solution of Markov reward models , 1996 .

[35]  James E. Robertson,et al.  Diagnostic Programs for the Illiac , 1953, Proceedings of the IRE.

[36]  M.A. Qureshi,et al.  The UltraSAN Modeling Environment , 1995, Perform. Evaluation.

[37]  Janak H. Patel,et al.  E-PROOFS: A CMOS bridging fault simulator , 1992, 1992 IEEE/ACM International Conference on Computer-Aided Design.

[38]  William H. Sanders,et al.  Experimental Evaluation of the Unavailability Induced by a Group Membership Protocol , 2002, EDCC.

[39]  William H. Sanders,et al.  An Efficient Two-Stage Iterative Method for the Steady-State Analysis of Markov Regenerative Stochastic Petri Net Models , 1996, Perform. Evaluation.

[40]  Ravishankar K. Iyer,et al.  An architectural framework for providing reliability and security support , 2004, International Conference on Dependable Systems and Networks, 2004.

[41]  Ravishankar K. Iyer,et al.  Effect of System Workload on Operating System Reliability: A Study on IBM 3081 , 1985, IEEE Transactions on Software Engineering.

[42]  Jacob A. Abraham,et al.  Test Generation for Microprocessors , 1980, IEEE Transactions on Computers.

[43]  John Rushby,et al.  Dependable Computing for Critical Applications 7 , 1999, Dependable Computing for Critical Applications 7.

[44]  Sundaram Seshu,et al.  The Diagnosis of Asynchronous Sequential Switching Systems , 1962, IRE Trans. Electron. Comput..

[45]  Ravishankar K. Iyer,et al.  Modeling and evaluating the security threats of transient errors in firewall software , 2004, Perform. Evaluation.

[46]  Ralph Allan Marlett,et al.  On the design and testing of self-diagnosable computers. , 1966 .

[47]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[48]  Donald Ralph Schertz,et al.  On the Representation of Digital Faults , 1969 .

[49]  William H. Sanders,et al.  An Approach for Bounding Reward Measures in Markov Models Using Aggregation , 2004 .

[50]  William H. Sanders,et al.  The Möbius Framework and Its Implementation , 2002, IEEE Trans. Software Eng..

[51]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[52]  Janak H. Patel,et al.  Reducing test application time for full scan embedded cores , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[53]  Ravishankar K. Iyer,et al.  The Effects of an ARMOR-based SIFT environment on the performance and dependability of user applications , 2004, IEEE Transactions on Software Engineering.

[54]  Jean Andre Dussault On the Design of Self-Checking Systems under Various Fault Models. , 1977 .

[55]  William H. Sanders,et al.  Loki: a state-driven fault injector for distributed systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[56]  William H. Sanders,et al.  The Mobius execution policy , 2001, Proceedings 9th International Workshop on Petri Nets and Performance Models.

[57]  Peter Weiner,et al.  Optimization of Reduced Dependencies for Synchronous Sequential Machines , 1967, IEEE Trans. Electron. Comput..

[58]  W. D. Obal,et al.  Measure-adaptive state-space construction methods , 1998 .

[59]  Ravishankar K. Iyer,et al.  Networked Windows NT system field failure data analysis , 1999, Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.

[60]  William H. Sanders,et al.  Stochastic Activity Networks: Formal Definitions and Concepts , 2002, European Educational Forum: School on Formal Methods and Performance Analysis.

[61]  Janak H. Patel,et al.  BART: a bridging fault test generator for sequential circuits , 1997, Proceedings International Test Conference 1997.

[62]  Nuno Neves,et al.  Coordinated checkpointing without direct coordination , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).

[63]  Ravishankar K. Iyer,et al.  Transparent runtime randomization for security , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[64]  William H. Sanders,et al.  A global-state-triggered fault injector for distributed system evaluation , 2004, IEEE Transactions on Parallel and Distributed Systems.

[65]  William H. Sanders,et al.  Probabilistic validation of an intrusion-tolerant replication system , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[66]  Janak H. Patel,et al.  HITEC: a test generation package for sequential circuits , 1991, Proceedings of the European Conference on Design Automation..

[67]  Janak H. Patel,et al.  Multiple-Fault Detection in Iterative Logic Arrays , 1985, ITC.

[68]  Carl E. Landwehr,et al.  Formal Models for Computer Security , 1981, CSUR.

[69]  G. Metze,et al.  Fault diagnosis of digital systems , 1970 .

[70]  Ravishankar K. Iyer,et al.  A data-driven finite state machine model for analyzing security vulnerabilities , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[71]  William H. Sanders,et al.  On integrating the MOBIUS and MODEST modeling tools , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[72]  Jacob A. Abraham,et al.  A Multivalued Algebra For Modeling Physical Failures in MOS VLSI Circuits , 1985, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[73]  William H. Sanders,et al.  An environment for importance sampling based on stochastic activity networks , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[74]  James E. Smith,et al.  Strongly Fault Secure Logic Networks , 1978, IEEE Transactions on Computers.

[75]  William H. Sanders,et al.  Mobius: framework and atomic models , 2001, Proceedings 9th International Workshop on Petri Nets and Performance Models.

[76]  James Edward Smith The design of totally self-checking combinational circuits. , 1976 .

[77]  Sudhakar M. Reddy,et al.  A March Test for Functional Faults in Semiconductor Random Access Memories , 1981, IEEE Transactions on Computers.

[78]  William H. Sanders,et al.  Probabilistic Validation of Intrusion Tolerance 1 , 2002 .

[79]  Franklin T. Luk Algorithm-based Fault Tolerance for Parallel Matrix Equation Solvers , 1986, Optics & Photonics.

[80]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[81]  Gernot Metze,et al.  Design of Totally Self-Checking Check Circuits for m-Out-of-n Codes , 1973, IEEE Transactions on Computers.

[82]  W.-T. Cheng,et al.  The BACK algorithm for sequential test generation , 1988, Proceedings 1988 IEEE International Conference on Computer Design: VLSI.

[83]  William H. Sanders,et al.  An experimental evaluation of correlated network partitions in the Coda distributed file system , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[84]  William H. Sanders,et al.  Fault injection based on a partial view of the global state of a distributed system , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[85]  Gernot Metze,et al.  A New Representation for Faults in Combinational Digital Circuits , 1972, IEEE Transactions on Computers.

[86]  Hong Zhao,et al.  Stress-Based and Path-Based Fault Injection , 1999, IEEE Trans. Computers.

[87]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[88]  Jacob A. Abraham,et al.  Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems , 1986, IEEE Transactions on Computers.

[89]  William H. Sanders,et al.  Symbolic state-space exploration and numerical analysis of state-sharing composed models , 2004 .

[90]  Ravishankar K. Iyer,et al.  Formal Reasoning of Various Categories of Widely Exploited Security Vulnerabilities by Pointer Taintedness Semantics , 2004, SEC.

[91]  Ravishankar K. Iyer,et al.  Checkpointing of control structures in main memory database systems , 2004, International Conference on Dependable Systems and Networks, 2004.

[92]  Nuno Neves,et al.  RENEW: a tool for fast and efficient implementation of checkpoint protocols , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[93]  William H. Sanders,et al.  A Unified Approach for Specifying Measures of Performance, Dependability and Performability , 1991 .

[94]  Tom Markham,et al.  Architecture and applications for a distributed embedded firewall , 2001, Seventeenth Annual Computer Security Applications Conference.

[95]  William H. Sanders,et al.  Quantifying the cost of providing intrusion tolerance in group communication systems , 2002, Proceedings International Conference on Dependable Systems and Networks.

[96]  William H. Sanders,et al.  Adaptive uniformization : technical details , 1993 .

[97]  Daniel D. Deavours,et al.  Formal Specification of the Möbius Modeling Framework , 2001 .

[98]  Jacob A. Abraham,et al.  Fault-secure algorithms for multiple-processor systems , 1984, ISCA '84.

[99]  Janak H. Patel,et al.  Reliability of scrubbing recovery-techniques for memory systems , 1990 .

[100]  William H. Sanders,et al.  Implementing a Stochastic Process Algebra within the Möbius Modeling Framework , 2001, PAPM-PROBMIV.

[101]  Janak H. Patel,et al.  Diagnosis and Repair of Memory with Coupling Faults , 1989, IEEE Trans. Computers.

[102]  W. Kent Fuchs,et al.  TWO-STAGE FAULT LOCATION , 1991, 1991, Proceedings. International Test Conference.

[103]  William H. Sanders,et al.  A Configurable CORBA Gateway for Providing Adaptable System Properties , 2002 .

[104]  William H. Sanders,et al.  Reduced Base Model Construction Methods for Stochastic Activity Networks , 1991, IEEE J. Sel. Areas Commun..

[105]  Gernot Metze,et al.  Fault Detection Capabilities of Alternating Logic , 1978, IEEE Transactions on Computers.

[106]  Ravishankar K. Iyer,et al.  The effect of system workload on error latency: an experimental study , 1985, SIGMETRICS '85.

[107]  Yves Deswarte,et al.  Intrusion tolerance in distributed computing systems , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[108]  R. S. Gaines,et al.  An Improved Cell Memory , 1965, IEEE Trans. Electron. Comput..

[109]  Janak H. Patel,et al.  Test set compaction algorithms for combinational circuits , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[110]  Eric George Manning On Computer Self-Diagnosis Part II-Generalizations and Design Principles , 1966, IEEE Trans. Electron. Comput..

[111]  William H. Sanders,et al.  An Adaptive Algorithm for Tolerating Value Faults and Crash Failures , 2001, IEEE Trans. Parallel Distributed Syst..

[112]  William H. Sanders,et al.  Integrated frameworks for multi-level and multi-formalism modeling , 1999, Proceedings 8th International Workshop on Petri Nets and Performance Models (Cat. No.PR00331).

[113]  Ravishankar K. Iyer,et al.  An experimental study of security vulnerabilities caused by errors , 2001, 2001 International Conference on Dependable Systems and Networks.

[114]  William H. Sanders,et al.  A Structured path-based approach for computing transient rewards of large CTMCs , 2004, First International Conference on the Quantitative Evaluation of Systems, 2004. QEST 2004. Proceedings..

[115]  Amy Lou Christensen Result Specification and Model Connection in the Möbius Modeling Framework , 2000 .

[116]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[117]  Jacob A. Abraham,et al.  FAULT-TOLERANT ALGORITHMS AND THEIR APPLICATION TO SOLVING LAPLACE EQUATIONS. , 1984 .

[118]  Ravishankar K. Iyer,et al.  Reliability of Internet Hosts: A Case Study from the End User's Perspective , 1999, Comput. Networks.

[119]  John P. Hayes,et al.  A Nand Model ror Fault Diagnosis in Combinational Logic Networks , 1971, IEEE Transactions on Computers.

[120]  Janak H. Patel,et al.  Concurrent Error Detection in Multiply and Divide Arrays , 1983, IEEE Transactions on Computers.

[121]  EDWARD J. McCLUSKEY,et al.  Fault Equivalence in Combinational Logic Networks , 1971, IEEE Transactions on Computers.

[122]  William H. Sanders,et al.  Transient solution of Markov models by combining adaptive and standard uniformization , 1997 .

[123]  Nitin Hemant Vaidya,et al.  Low-cost schemes for fault tolerance , 1993 .

[124]  William H. Sanders,et al.  State-Space Support for Path-Based Reward Variables , 1999, Perform. Evaluation.

[125]  William H. Sanders,et al.  Dynamic node management and measure estimation in a state-driven fault injector , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[126]  William H. Sanders,et al.  Model-based validation of an intrusion-tolerant information system , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[127]  Bruno Dutertre,et al.  Intrusion-tolerant Enclaves , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[128]  Janak H. Patel,et al.  Fast and accurate CMOS bridging fault simulation , 1993, Proceedings of IEEE International Test Conference - (ITC).

[129]  D. A. Anderson,et al.  Design of self-checking digital networks using coding techniques , 1971 .

[130]  William H. Sanders,et al.  An Adaptive Quality of Service Aware Middleware for Replicated Services , 2003, IEEE Trans. Parallel Distributed Syst..

[131]  Ravishankar K. Iyer,et al.  Fault latency in the memory - An experimental study on VAX 11/780 , 1986 .

[132]  William H. Sanders,et al.  A new methodology for calculating distributions of reward accumulated during a finite interval , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[133]  Arthur D. Friedman,et al.  System-Level Fault Diagnosis , 1980, Computer.

[134]  Fabrice Stevens,et al.  Validation of an Intrusion-Tolerant Information System Using Probabilistic Modeling , 2004 .

[135]  William H. Sanders,et al.  AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects , 2003, IEEE Trans. Computers.

[136]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[137]  William H. Sanders,et al.  Proteus: a flexible infrastructure to implement adaptive fault tolerance in AQuA , 1999, Dependable Computing for Critical Applications 7.

[138]  Jacob A. Abraham,et al.  TESTING OF SEMICONDUCTOR RANDOM ACCESS MEMORIES. , 1977 .

[139]  William H. Sanders,et al.  Efficient simulation of hierarchical stochastic activity network models , 1993, Discret. Event Dyn. Syst..

[140]  Harold S. Javitz,et al.  The SRI IDES statistical anomaly detector , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[141]  D. A. Anderson,et al.  Design of Totally Self-Checking Check Circuits for M-out of-N Codes , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[142]  Jacob A. Abraham,et al.  Fault-Tolerant Matrix Operations On Multiple Processor Systems Using Weighted Checksums , 1984, Optics & Photonics.

[143]  Janak H. Patel,et al.  Memory System Design for Tolerating Single Event Upsets , 1983, IEEE Transactions on Nuclear Science.

[144]  Sundaram Seshu,et al.  On an Improved Diagnosis Program , 1965, IEEE Trans. Electron. Comput..

[145]  Richard E. Schantz,et al.  Survival by defense-enabling , 2001, NSPW '01.

[146]  Kilin To Fault Folding for Irredundant and Redundant Combinational Circuits , 1973, IEEE Transactions on Computers.

[147]  Janak H. Patel,et al.  Diagnostic test pattern generation for sequential circuits , 1997, Proceedings. 15th IEEE VLSI Test Symposium (Cat. No.97TB100125).

[148]  William H. Sanders,et al.  The Mobius modeling tool , 2001, Proceedings 9th International Workshop on Petri Nets and Performance Models.

[149]  Yansong Ren,et al.  AQuA: A Framework for Providing Adaptive Fault Tolerance to Distributed Applications , 2001 .

[150]  Jacob A. Abraham,et al.  Efficient Algorithms for Testing Semiconductor Random-Access Memories , 1978, IEEE Transactions on Computers.

[151]  Ravishankar K. Iyer,et al.  A framework for database audit and control flow checking for a wireless telephone network controller , 2001, 2001 International Conference on Dependable Systems and Networks.

[152]  H. Ramasamy,et al.  Providing Intrusion Tolerance With ITUA , 2002 .

[153]  Vamsi Boppana,et al.  Fault dictionary compaction by output sequence removal , 1994, ICCAD.

[154]  Steven Scott Gorshe,et al.  Concurrent error detection , 2002 .

[155]  William H. Sanders,et al.  The Möbius state-level abstract functional interface , 2003, Perform. Evaluation.

[156]  Ravishankar K. Iyer,et al.  A system model for dynamically reconfigurable software , 2003, IBM Syst. J..

[157]  Ralph A. Marlett An Effective Test Generation System for Sequential Circuits , 1986, DAC 1986.