ARMADA Middleware and Communication Services

Real-time embedded systems have evolved during the past several decades from small custom-designed digital hardware to large distributed processing systems. As these systems become more complex, their interoperability, evolvability and cost-effectiveness requirements motivate the use of commercial-off-the-shelf components. This raises the challenge of constructing dependable and predictable real-time services for application developers on top of the inexpensive hardware and software components which has minimal support for timeliness and dependability guarantees. We are addressing this challenge in the ARMADA project.ARMADA is set of communication and middleware services that provide support for fault-tolerance and end-to-end guarantees for embedded real-time distributed applications. Since real-time performance of such applications depends heavily on the communication subsystem, the first thrust of the project is to develop a predictable communication service and architecture to ensure QoS-sensitive message delivery. Fault-tolerance is of paramount importance to embedded safety-critical systems. In its second thrust, ARMADA aims to offload the complexity of developing fault-tolerant applications from the application programmer by focusing on a collection of modular, composable middleware for fault-tolerant group communication and replication under timing constraints. Finally, we develop tools for testing and validating the behavior of our services. We give an overview of the ARMADA project, describing the architecture and presenting its implementation status.

[1]  Rene Leonardo Cruz A Calculus for Network Delay and a Note on Topologies of Interconnection Networks , 1987 .

[2]  Jean Arlat,et al.  Experimental evaluation of the fault tolerance of an atomic multicast system , 1990 .

[3]  Dinesh C. Verma,et al.  A Scheme for Real-Time Channel Establishment in Wide-Area Networks , 1990, IEEE J. Sel. Areas Commun..

[4]  Ramesh Govindan,et al.  Support for continuous media in the DASH system , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[5]  D. McCue,et al.  Fault-Tolerance in the Advanced Automation System , 1991, OPSR.

[6]  Kang G. Shin,et al.  Real-time communication in multi-hop networks , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[7]  Larry L. Peterson,et al.  The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[8]  Ikuo Nakata,et al.  Programming with Streams in a Pascal-Like Language , 1991, IEEE Trans. Software Eng..

[9]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[10]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[11]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[12]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[13]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[14]  Robbert van Renesse,et al.  Design and Performance of Horus: A Lightweight Group Communications System , 1994 .

[15]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[16]  V. Paxson,et al.  Wide-area traffic: the failure of Poisson modeling , 1994, SIGCOMM.

[17]  Walter Willinger,et al.  Analysis, modeling and generation of self-similar VBR video traffic , 1994, SIGCOMM.

[18]  Günter Grünsteidl,et al.  TTP - A Protocol for Fault-Tolerant Real-Time Systems , 1994, Computer.

[19]  Kang G. Shin,et al.  Real-Time Communication in Multihop Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[20]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[21]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[22]  Farnam Jahanian,et al.  Testing of fault-tolerant and real-time distributed systems via protocol fault injection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[23]  Anees Shaikh,et al.  RTCAST: lightweight multicast for real-time process groups , 1996, Proceedings Real-Time Technology and Applications.

[24]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[25]  Franco Travostino,et al.  Paths: programming with system resources in support of real-time distributed applications , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[26]  Farnam Jahanian,et al.  Experiments on six commercial TCP implementations using a software fault injection tool , 1997, Softw. Pract. Exp..

[27]  Kang G. Shin,et al.  Structuring communication software for quality-of-service guarantees , 1996, 17th IEEE Real-Time Systems Symposium.

[28]  Anees Shaikh,et al.  Realizing services for guaranteed-QoS communication on a microkernel operating system , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[29]  Farnam Jahanian,et al.  Real-time primary-backup (RTPB) replication with temporal consistency guarantees , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).