An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes

A test-based approach to the evaluation of fault-tolerant distributed-computing schemes is discussed. The approach is based on experimental incorporation of system structuring and design techniques into real-time distributed computing testbeds centered around tightly coupled microcomputer networks. The effectiveness of this approach has been experimentally confirmed. Primary advantages of the testbed-based approach include the relatively high accuracy of the data obtained on timing and logical complexity, as well as the relatively high degree of assurance that can be obtained on the practical effectiveness of the scheme evaluated. Various design issues encountered in the course of establishing the basic microcomputer network testbed facilities are discussed, along with their augmentation to support some experiments. The shortcomings of the testbeds that have been recognized are also discussed together with the desired extensions of the testbeds. Some of the desired extensions are beyond the state-of-the-art in microcomputer network implementation. >

[1]  Bharat K. Bhargava,et al.  The Raid Distributed Database System , 1989, IEEE Trans. Software Eng..

[2]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach to Uniform Treatment of Hardware and Software Faults , 1984, IEEE International Conference on Distributed Computing Systems.

[3]  K.H. Kim,et al.  Testbed-based validation of design techniques for reliable distributed real-time systems , 1987, Proceedings of the IEEE.

[4]  William C. McDonald,et al.  A flexible distributed testbed for real-time applications , 1982, Computer.

[5]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[6]  Walter H. Kohler,et al.  Performance Evaluation of Integrated Concurrency Control and Recovery Algorithms Using a Distributed Transaction Processing Testbed , 1985, ICDCS.

[7]  David D. Redell,et al.  Evolution of the Ethernet Local Computer Network , 1982, Computer.

[8]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.

[9]  Per Brinch Hansen,et al.  The Architecture of Concurrent Programs , 1977 .

[10]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[11]  Robert H. Thomas,et al.  The Architecture of the Cronus Distributed Operating System , 1986, IEEE International Conference on Distributed Computing Systems.

[12]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications , 1989, IEEE Trans. Computers.

[13]  K. H. Kim,et al.  Evolution of a virtual machine supporting fault-tolerant distributed processes at a research laboratory , 1984, 1984 IEEE First International Conference on Data Engineering.

[14]  W. C. McDonald,et al.  Real-time multi-microcomputer architecture employing a fully parallel crossbar switch , 1983 .

[15]  Brian Randell System structure for software fault tolerance , 1975 .

[16]  Jaechul Yoon,et al.  An approach to design of fault-tolerant real-time tightly coupled networks and its experimental validation , 1988 .