The timely computing base: Timely actions in the presence of uncertain timeliness

Real-time behavior is specified in compliance with timeliness requirements, which in essence calls for synchronous system models. However systems often rely on unpredictable and unreliable infrastructures, that suggest the use of asynchronous models. Several models have been proposed to address this issue. We propose an architectural construct that takes a generic approach to the problem of programming in the presence of uncertain timeliness. We assume the existence of a component, capable of executing timing functions, which helps applications with varying degrees of synchrony to behave reliably despite the occurrence of timing failures. We call this component the Timely Computing Base, TCB. This paper describes the TCB architecture and model, and discusses the application programming interface for accessing the TCB services. The implementation of the TCB services uses fail-awareness techniques to increase the coverage of TCB properties.

[1]  David Powell Extra Performance Architecture (XPA) , 1991 .

[2]  M. Rahnema,et al.  Overview of the GSM system and protocol architecture , 1993, IEEE Communications Magazine.

[3]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[4]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[5]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[6]  Paulo Veríssimo,et al.  Timing failure detection and real-time group communication in quasi-synchronous systems , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[7]  Marcelo Lubaszewski,et al.  A Reliable Fail-Safe System , 1998, IEEE Trans. Computers.

[8]  Fernando Gustavo Tinetti,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers. Barry Wilkinson, C. Michael Allen , 2000 .

[9]  Emmanuelle Anceaume,et al.  Performance Evaluation of Clock Synchronization Algorithms , 1998 .

[10]  Antonio Casimiro,et al.  Using the timely computing base for dependable QoS adaptation , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[11]  Arnold O. Allen,et al.  Probability, statistics and queueing theory - with computer science applications (2. ed.) , 1981, Int. CMG Conference.

[12]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[13]  Nicolas Rivierre,et al.  Real-time communications over broadcast networks : the CSMA-DCR and the DOD-CSMA-CD protocols , 1993 .

[14]  Ian Foster,et al.  A quality of service architecture that combines resource reservation and application adaptation , 2000, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400).

[15]  William H. Sanders,et al.  An adaptive framework for tunable consistency and timeliness using replication , 2002, Proceedings International Conference on Dependable Systems and Networks.

[16]  Antonio Casimiro,et al.  Using atomic broadcast to implement a posteriori agreement for clock synchronization , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[17]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[18]  Paulo Veríssimo,et al.  Real time and dependability concepts , 1993 .

[19]  Andrew T. Campbell,et al.  A QoS adaptive transport system: design, implementation and experience , 1997, MULTIMEDIA '96.

[20]  Eduardo Tovar,et al.  Distributed Computer-Controlled Systems: the DEAR-COTS approach , 2000 .

[21]  Antonio Casimiro,et al.  The Timely Computing Base Model and Architecture , 2002, IEEE Trans. Computers.

[22]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[23]  Emmanuelle Anceaume,et al.  On the Formal Specification of Group Membership Services , 1994 .

[24]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[25]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[26]  Scott Shenker,et al.  Integrated Services in the Internet Architecture : an Overview Status of this Memo , 1994 .

[27]  Henning Schulzrinne,et al.  Comparison of Adaptive Internet Multimdia Applications , 1999 .

[28]  Hermann Kopetz,et al.  The time-triggered model of computation , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[29]  Farnam Jahanian Fault Tolerance in Embedded Real-Time Systems , 1993, Hardware and Software Architectures for Fault Tolerance.

[30]  Paulo Ver The Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness , 2000 .

[31]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[32]  Antonio Casimiro,et al.  Measuring distributed durations with stable errors , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[33]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[34]  André Schiper,et al.  Virtually-synchronous communication based on a weak failure suspector , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[35]  Martin de Prycker,et al.  Asynchronous Transfer Mode, Solution for Broadband Isdn , 1991 .

[36]  Kang G. Shin,et al.  Probabilistic clock synchronization in large distributed systems , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[37]  Farnam Jahanian,et al.  Real-time primary-backup (RTPB) replication with temporal consistency guarantees , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[38]  David Powell,et al.  Failure mode assumptions and assumption coverage , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[39]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[40]  Flaviu Cristian,et al.  Fail-awareness: an approach to construct fail-safe applications , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[41]  Henning Schulzrinne,et al.  Internet Quality of Service: An Overview , 2000 .

[42]  Andrew T. Campbell,et al.  A survey of QoS architectures , 1998, Multimedia Systems.

[43]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[44]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[45]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[46]  Paulo Veríssimo,et al.  AMp: a highly parallel atomic multicast protocol , 1989, SIGCOMM '89.

[47]  E. D. Jensen,et al.  Alpha: a nonproprietary OS for large, complex, distributed real-time systems , 1990, IEEE Workshop on Experimental Distributed Systems.

[48]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[49]  Flaviu Cristian,et al.  Fail-aware datagram service , 1999, IEE Proc. Softw..

[50]  Klara Nahrstedt,et al.  Multimedia service configuration and reservation in heterogeneous environments , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[51]  Gerhard Fohler,et al.  An Engineering Approach to Hard Real-Time System Design , 1991, ESEC.

[52]  David Powell,et al.  Distributed fault tolerance: lessons from Delta-4 , 1994, IEEE Micro.

[53]  Krithi Ramamritham,et al.  Distributed Scheduling of Tasks with Deadlines and Resource Requirements , 1989, IEEE Trans. Computers.

[54]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[55]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[56]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach to Uniform Treatment of Hardware and Software Faults , 1984, IEEE International Conference on Distributed Computing Systems.

[57]  Paulo Veríssimo,et al.  Quasi-Synchronism: a step away from the traditional fault-tolerant real-time system models , 1995 .

[58]  Pedro Miguel Rebelo Martins Concretização de uma Timely computing base , 2002 .

[59]  Krithi Ramamritham,et al.  Scheduling algorithms and operating systems support for real-time systems , 1994, Proc. IEEE.

[60]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[61]  Paulo Veríssimo,et al.  Reliable broadcast for fault-tolerance on local computer networks , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[62]  Flaviu Cristian,et al.  Probabilistic internal clock synchronization , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[63]  Lixia Zhang,et al.  Resource ReSerVation Protocol (RSVP) - Version 1 Functional Specification , 1997, RFC.

[64]  H. Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992, Dependable Computing and Fault-Tolerant Systems.

[65]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[66]  Flaviu Cristian,et al.  Fail-awareness in timed asynchronous systems , 1996, PODC '96.

[67]  Jean Arlat,et al.  Can we rely on COTS microkernels for building fault-tolerant systems? , 1997, Proceedings of the Sixth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems.

[68]  Günter Grünsteidl,et al.  TTP - A Protocol for Fault-Tolerant Real-Time Systems , 1994, Computer.

[69]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[70]  Özalp Babaoglu,et al.  Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[71]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[72]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[73]  Zohar Manna,et al.  The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.

[74]  Henning Schulzrinne,et al.  Dynamic QoS control of multimedia applications based on RTP , 1996, Comput. Commun..

[75]  Paulo Veríssimo,et al.  A posteriori agreement for fault-tolerant clock synchronization on broadcast networks , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[76]  Sam Toueg,et al.  Optimal clock synchronization , 1985, PODC '85.

[77]  Paulo Veríssimo,et al.  The Timely Computing Base , 1999 .

[78]  J. Arlat,et al.  PADRE: a Protocol for Asymmetric Duplex REdundancy , 1999, Dependable Computing for Critical Applications 7.

[79]  Henrique Madeira,et al.  Xception: Software Fault Injection and Monitoring in Processor Functional Units1 , 1995 .

[80]  K. Arvind,et al.  Probabilistic Clock Synchronization in Distributed Systems , 1994, IEEE Trans. Parallel Distributed Syst..

[81]  Kang G. Shin,et al.  QoS negotiation in real-time systems and its application to automated flight control , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[82]  Victor Yodaiken,et al.  A Real-Time Linux , 2000 .

[83]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[84]  Paulo Veríssimo,et al.  Distributed Systems for System Architects , 2001, Advances in Distributed Computing and Middleware.

[85]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[86]  Domenico Ferrari Client requirements for real-time communication services , 1990 .

[87]  Marcos K. Aguilera,et al.  Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication , 1997, WDAG.

[88]  Jane W.-S. Liu,et al.  Scheduling real-time applications in an open environment , 1997, Proceedings Real-Time Systems Symposium.

[89]  Hairong Sun,et al.  Quality of service: delivering QoS on the internet and in corporate networks; P. Ferguson, G. Huston , 1999, Comput. Commun..

[90]  Antonio Casimiro,et al.  Generic timing fault tolerance using a timely computing base , 2002, Proceedings International Conference on Dependable Systems and Networks.

[91]  Thomas F. Lawrence,et al.  Modeling applications for adaptive QoS-based resource management , 1997, Proceedings 1997 High-Assurance Engineering Workshop.

[92]  Matti A. Hiltunen,et al.  Supporting customized failure models for distributed software , 1999, Distributed Syst. Eng..

[93]  Saurabh Bagchi,et al.  Chameleon: a software infrastructure for adaptive fault tolerance , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).

[94]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[95]  Flaviu Cristian,et al.  Synchronous and Asynchronous Group Communication. , 1996 .

[96]  Fred B. Schneider,et al.  Understanding Protocols for Byzantine Clock Synchronization , 1987 .

[97]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[98]  Michel Raynal,et al.  Timed consistency for shared distributed objects , 1999, PODC '99.

[99]  Flaviu Cristian,et al.  Fail-aware failure detectors , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[100]  Paulo Veríssimo,et al.  Timing Failure Detection with a Timely Computing Base , 1999 .