Toward Fault-Tolerant Adaptive Real-Time Distributed Systems

A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, based on the fail-signal processor, is described. Low error detection latency time is a primary goal. A fail-signal processor comprises an application processor along with a simple monitoring processor that detects abnormal functional or timing behaviour in the application processor; on such a failure the monitor issues a failure signal to other fail-signal processors and resets the application processor. The serviceow graph, used to specify real-time services, shows how a service is decomposed, redundantly designed, and structured to meet time-bounds. Information obtained from serviceow graphs along with run-time information provided by the fail-signal processors permits: (1) forward error recovery from failures in application processors; (2) avoidance or prediction of service timing failures; and (3) recon guration with graceful degradation. Avoidance of timing failures is based on adaptive scheduling.

[1]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[2]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[3]  R. H. Campbell,et al.  A fault-tolerant scheduling problem , 1989, IEEE Transactions on Software Engineering.

[4]  Glenn H. MacEwen,et al.  RNet: A Hard Real-Time Distributed Programming System , 1987, IEEE Transactions on Computers.

[5]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[6]  John Paul Shen,et al.  Continuous signature monitoring: efficient concurrent-detection of processor control errors , 1988, International Test Conference 1988 Proceeding@m_New Frontiers in Testing.

[7]  Krithi Ramamritham,et al.  The integration of deadline and criticalness in hard real-time scheduling , 1988, Proceedings. Real-Time Systems Symposium.

[8]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[9]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Krithi Ramamritham,et al.  Distributed Scheduling of Tasks with Deadlines and Resource Requirements , 1989, IEEE Trans. Computers.

[11]  Wei Kuan Shih,et al.  Fast algorithms for scheduling imprecise computations , 1989, RTSS 1989.

[12]  Rajiv Gupta,et al.  Applying compiler techniques to scheduling in real-time systems , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[13]  Jane W.-S. Liu,et al.  Scheduling Periodic Jobs That Allow Imprecise Results , 1990, IEEE Trans. Computers.

[14]  Dan C. Marinescu,et al.  Specification and identification of events for debugging and performance monitoring of distributed multiprocessor systems , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[15]  Kang G. Shin,et al.  Application of real-time monitoring to scheduling tasks with random execution times , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[16]  Alan C. Shaw,et al.  Experiments with a program timing tool based on source-level timing schema , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[17]  Richard C. Holt,et al.  Analyzing Hard-Real-Time Programs For Guaranteed Schedulability , 1991, IEEE Trans. Software Eng..

[18]  Mukesh Singhal,et al.  A transfer policy for global scheduling algorithms to schedule tasks with deadlines , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[19]  Wei-Kuan Shih,et al.  Algorithms for scheduling imprecise computations , 1991, Computer.

[20]  Kwei-Jay Lin,et al.  Building flexible real-time systems using the Flex language , 1991, Computer.

[21]  Henrique Madeira,et al.  On-Line Signature Learning and Checking , 1992 .