SUVS: a distributed real-time system testbed for fault-tolerant computing

~ A distributed real-time system testbed to support experimental research has been established at the University of Texas at Arlington (UTA). The testbed, named SUVS (Simplified Unmanned Vehicle System), is being used LOconduct experimental evaluation of techniques and methods for design of reliable distributed real-time systems. Thus far, SUVS has been primarily successful in experimentation with system level fault tolerance techniques. The SUVS is planned for clinicaf study of specification, design and implementation methods studied at UTA for real-time distributed/parallel systems. The software part of SUVS consists of a set of sensor tasks, analyzer tasks, and actuator tasks. The first version of SUVS was implemented using Verdix Ada on Micro Vax 3900/Ultrix. The second version reported here is written in C and is rtsrming on a network of eight SUN Workstations, The need for cooperation among tasks in error detection and recovery is the major factor behind choosing the conversation scheme in fault-tolerant (IT-SUVS). A classic TMR-like voting scheme has also been implemented for FT-SUVS. We demonstrate how the multiple versions of software can be generated in a systematic way for this type of application. The target implementation for PT-SUVS is a hybrid parallel architecture. (This architecture is simulated on a network of SUN Workstations.) The implementation details of FT-S UVS and the preliminary timing and reliability measurements (in the presence of various injected faults) are discussed. These results indicate that the IT-SUV S framework is very useful in developing experimental approaches for design of reliable dis tributcd real-time sys terns.

[1]  John D. Musa,et al.  Software-reliability engineering: technology for the 1990s , 1990, IEEE Software.

[2]  Brian Randell System structure for software fault tolerance , 1975 .

[3]  K. H. Kim,et al.  Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation , 1988, IEEE Trans. Software Eng..

[4]  Parameswaran Ramanathan,et al.  Checkpointing and rollback recovery in a distributed system using common time base , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[5]  Walter H. Kohler,et al.  Performance Evaluation of Integrated Concurrency Control and Recovery Algorithms Using a Distributed Transaction Processing Testbed , 1985, ICDCS.

[6]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.

[7]  C. A. R. Hoare,et al.  Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.

[8]  Bharat K. Bhargava,et al.  The Raid Distributed Database System , 1989, IEEE Trans. Software Eng..

[9]  William C. McDonald,et al.  A flexible distributed testbed for real-time applications , 1982, Computer.

[10]  Dharma P. Agrawal,et al.  Evaluating the performance of multicomputer configurations , 1986 .

[11]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[12]  Bharat K. Bhargava,et al.  Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[13]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[14]  K. H. Kim,et al.  An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes , 1989, IEEE Trans. Software Eng..

[15]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[16]  Aloysius K. Mok,et al.  Safety analysis of timing properties in real-time systems , 1986, IEEE Transactions on Software Engineering.

[17]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[18]  N. Ramsey Developing formally verified Ada programs , 1989, IWSSD '89.

[19]  K. H. Kim,et al.  Implementation of the conversion scheme in loosely coupled distributed computer systems , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.