Reliable and Highly Available Distributed Publish/Subscribe Service

This paper develops reliable distributed publish/subscriber algorithms with service availability in the face of concurrent crash failure of up to $\delta$ brokers. The reliability of service in our context refers to per-source in-order and exactly-once delivery of publications to matching subscribers. To handle failures, brokers maintain data structures that enable them to reconnect the topology and compute new forwarding paths on the fly. This enables fast reaction to failures and improves the system's availability. Moreover, we present a recovery procedure that recovering brokers execute in order to re-enter the system, and synchronize their routing information.

[1]  Pascal Felber,et al.  XNET: a reliable content-based publish/subscribe system , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[2]  Alex C. Snoeren,et al.  Mesh-based content routing using XML , 2001, SOSP.

[3]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[4]  Paolo Costa,et al.  Introducing reliability in content-based publish-subscribe through epidemic algorithms , 2003, DEBS '03.

[5]  Saurabh Bagchi,et al.  Exactly-once delivery in a content-based publish-subscribe system , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Peter R. Pietzuch,et al.  Peer-to-peer overlay broker networks in an event-based middleware , 2003, DEBS '03.

[7]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[8]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[9]  S RosenblumDavid,et al.  Design and evaluation of a wide-area event notification service , 2001 .

[10]  Hans-Arno Jacobsen,et al.  Adaptive Content-Based Routing in General Overlay Topologies , 2008, Middleware.

[11]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[12]  Hans-Arno Jacobsen,et al.  Decentralized Execution of Event-Driven Scientific Workflows , 2006, 2006 IEEE Services Computing Workshops.

[13]  Amy L. Murphy,et al.  Minimizing the reconfiguration overhead in content-based publish-subscribe , 2004, SAC '04.

[14]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].