Topology-Aware Algorithms for Large-Scale Communication

When designing communication protocols there is always a tradeoff between generality and performance. This chapter reports one approach to achieve right balance between these two aspects, using a network model that can be applied to the majority of existing large-scale networks based on reliable high-speed local-area networks interconnected by slower long-haul connections. The approach consists in making visible relevant topological aspects of the uderlying network infrastructure to the protocol designer, and is illustrated by several algorithms that use topology information to achieve improved performance.

[1]  Paulo Veríssimo,et al.  xAMp: a multi-primitive group communications service , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[2]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[3]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[4]  Paulo Veríssimo,et al.  Totally ordered multicast in large-scale systems , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[5]  Kenneth P. Birman,et al.  Fast Causal Multicast , 1991, ACM SIGOPS Oper. Syst. Rev..

[6]  Bernadette Charron-Bost,et al.  Concerning the Size of Logical Clocks in Distributed Systems , 1991, Inf. Process. Lett..

[7]  André Schiper,et al.  A New Algorithm to Implement Causal Ordering , 1989, WDAG.

[8]  Sriram Sankar,et al.  Exploiting locality in maintaining potential causality , 1991, PODC '91.

[9]  David L. Mills Network Time Protocol (version 2) specification and implementation , 1989, RFC.

[10]  Liuba Shrira,et al.  Lazy replication: exploiting the semantics of distributed services (extended abstract) , 1990, OPSR.

[11]  Peter W. M. John Statistical Methods in Engineering and Quality Assurance , 1990 .

[12]  Neil V. Murray,et al.  Inference with path resolution and semantic graphs , 1987, JACM.

[13]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[14]  Louise E. Moser,et al.  The Totem system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[16]  Danny Dolev,et al.  Early delivery totally ordered multicast in asynchronous environments , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[17]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[18]  Paulo Veríssimo,et al.  Causal separators for large-scale multicast communication , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[19]  André Schiper,et al.  The Causal Ordering Abstraction and a Simple Way to Implement it , 1991, Inf. Process. Lett..

[20]  Shimon Even,et al.  Graph Algorithms , 1979 .

[21]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[22]  Ajay D. Kshemkalyani,et al.  An Efficient Implementation of Vector Clocks , 1992, Inf. Process. Lett..

[23]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[24]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[25]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[26]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[27]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[28]  Phil Kearns,et al.  Bounding sequence numbers in distributed systems: a general approach , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[29]  David L. Mills,et al.  Network Time Protocol (Version 3) Specification, Implementation , 1992 .

[30]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.