Decentralized Message Ordering for Publish/Subscribe Systems

We describe a method to order messages across groups in a publish/subscribe system without centralized control or large vector timestamps. We show that our scheme is practical---little state is required; that it is scalable---the maximum message load is limited by receivers; and that it performs well---the paths messages traverse to be ordered are not made much longer than necessary. Our insight is that only messages to groups that overlap in membership can be observed to arrive out of order: sequencing messages to these groups is sufficient to provide a consistent order, and when publishers subscribe to the groups to which they send, this message order is a causal order.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[3]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[4]  Christophe Diot,et al.  Design and evaluation of MiMaze a multi-player game on the Internet , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[5]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[6]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[7]  Xiaohua Jia A Total Ordering Multicast Protocol Using Propagation Trees , 1995, IEEE Trans. Parallel Distributed Syst..

[8]  Hiroaki Hazeyama,et al.  Zoned federation of game servers: a peer-to-peer approach to scalable multi-player online games , 2004, NetGames '04.

[9]  Yutaka Ishibashi,et al.  Adaptive causality and media synchronization control for networked multimedia applications , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[10]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[11]  Philip K. McKinley,et al.  A token-based protocol for reliable, ordered multicast communication , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[12]  Flaviu Cristian New asynchronous atomic broadcast protocols , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[13]  Srinivasan Seshan,et al.  Mercury: a scalable publish-subscribe system for internet games , 2002, NetGames '02.

[14]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[15]  Danny Dolev,et al.  Early delivery totally ordered multicast in asynchronous environments , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[16]  André Schiper,et al.  A New Algorithm to Implement Causal Ordering , 1989, WDAG.

[17]  Yutaka Ishibashi,et al.  A media synchronization scheme with causality control in network environments , 1999, Proceedings 24th Conference on Local Computer Networks. LCN'99.

[18]  Alec Wolman,et al.  On the scale and performance of cooperative Web proxy caching , 1999, SOSP.

[19]  Hui Zhang,et al.  A case for end system multicast (keynote address) , 2000, SIGMETRICS '00.

[20]  Katherine L. Morse,et al.  Interest Management in Large-Scale Distributed Simulations , 1996 .

[21]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[22]  Andrew S. Tanenbaum,et al.  An evaluation of the Amoeba group communication system , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[23]  Srinivasan Seshan,et al.  A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[24]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[25]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[26]  Marcos K. Aguilera,et al.  Efficient atomic broadcast using deterministic merge , 2000, PODC '00.

[27]  G. Voelker,et al.  On the scale and performance of cooperative Web proxy caching , 2000, OPSR.

[28]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[29]  Todd Montgomery,et al.  A High Performance Totally Ordered Multicast Protocol , 1994, Dagstuhl Seminar on Distributed Systems.

[30]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[31]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[32]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[33]  Shun-Yun Hu,et al.  Scalable peer-to-peer networked virtual environment , 2004, NetGames '04.

[34]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[35]  André Schiper,et al.  Theory and Practice in Distributed Systems , 1995, Lecture Notes in Computer Science.