Large-scale content based publish, subscribe systems

Today, the architecture of distributed computer systems is dominated by client/server platforms relying on synchronous request/reply. This architecture is not well suited to implement information-driven applications like news delivery, stock quoting, air traffic control, and dissemination of auction bids due to the inherent mismatch between the demands of these applications and the characteristics of those platforms. In contrast to that, publish/subscribe directly reflects the intrinsic behavior of information-driven applications because communication here is indirect and initiated by producers of information: Producers publish notifications and these are delivered to subscribed consumers by the help of a notification service that decouples the producers and the consumers. Therefore, publish/subscribe should be the first choice for implementing such applications. The expressiveness of the notification selection mechanism used by the consumers to describe the notifications they are interested in is crucial for the flexibility of a notification service. Content-based notification selection is most expressive because it allows to evaluate filter predicates over the whole content of a notification. The advantage in expressiveness compared to channel- or subject-based selection results in increased flexibility facilitating extensibility and change. On the other hand, scalable implementations of content-based notification services are difficult to realize. Indeed, the expressiveness of notification selection must be carefully chosen in large-scale systems, because expressiveness and scalability are interdependent. Hence, the most fundamental problem in the area of content-based publish/subscribe systems is probably the scalable routing of notifications from their producers to their respective consumers. Unfortunately, existing content-based notification services are not mature enough to be used in large-scale, widely-distributed environments. Most existing notification services are either centralized, use flooding, or use simple routing algorithms that assume that each event broker has global knowledge about all active subscriptions. All these approaches exhibit severe scalability problems in large-scale systems. In contrast to that, this thesis concentrates on mechanisms to improve the scalability of content-based routing algorithms and presents more advanced routing algorithms that do not rely on global knowledge. The algorithms presented here exploit similarities between subscriptions by using identity- and covering-tests, and by merging filters. While identity-based routing is a simplified version of covering-based routing, merging-based routing is more advanced because it exploits the concept of filter merging. Furthermore, the idea of imperfect routing algorithms is introduced. The thesis consists of a theoretical and a practical part. The theoretical part presents a formal specification of publish/subscribe systems, a routing framework and a set of routing algorithms, and discusses how the routing optimizations can be broken down to the actual data/filter model. The practical part presents the implementation of the Rebeca notification service which supports advertisements and all the routing algorithms mentioned above. A detailed practical evaluation of the implemented algorithms based upon the prototype is also presented.

[1]  Hector Garcia-Molina,et al.  Exactly-once semantics in a replicated messaging system , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[3]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[4]  Ben Y. Zhao,et al.  Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination , 2001, NOSSDAV '01.

[5]  Bill Segall,et al.  Content Based Routing with Elvin4 , 2000 .

[6]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[7]  Stanley B. Zdonik,et al.  “Data in your face”: push technology in perspective , 1998, SIGMOD '98.

[8]  Christof Bornhövd,et al.  Moving Active Functionality from Centralized to Open Distributed Heterogeneous Environments , 2001, CoopIS.

[9]  David S. Rosenblum,et al.  Achieving scalability and expressiveness in an Internet-scale event notification service , 2000, PODC '00.

[10]  Christoph Liebig,et al.  A notification service for next-generation IT systems in air traffic control , 1999 .

[11]  A. Buchmann,et al.  Integrat ing Not if ications and Transactions : Concepts and X 2 TS Prototype , 1999 .

[12]  Joshua S. Auerbach,et al.  Exploiting IP Multicast in Content-Based Publish-Subscribe Systems , 2000, Middleware.

[13]  Felix C. Freiling,et al.  A modular approach to build structured event-based systems , 2002, SAC '02.

[14]  Douglas C. Schmidt,et al.  The design and performance of a real-time CORBA event service , 1997, OOPSLA '97.

[15]  Guruduth Banavar,et al.  An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[16]  Geraldine Fitzpatrick,et al.  Augmenting the workaday world with Elvin , 1999, ECSCW.

[17]  Scarlet Schwiderski-Grosche Monitoring the behaviour of distributed systems , 1996 .

[18]  David S. Platt Understanding COM , 1999 .

[19]  Alfonso Fuggetta,et al.  Analyzing the Behavior of Event Dispatching Systems through Simulation , 2000, HiPC.

[20]  Eric N. Hanson,et al.  A predicate matching algorithm for database rule systems , 1990, SIGMOD '90.

[21]  Jean Bacon,et al.  Access control and trust in the use of widely distributed services , 2001, Softw. Pract. Exp..

[22]  D. Woolley The White Paper. , 1972, British medical journal.

[23]  Elisabetta Di Nitto,et al.  Issues in analyzing the behavior of event dispatching systems , 2000, Tenth International Workshop on Software Specification and Design. IWSSD-10 2000.

[24]  Alejandro P. Buchmann,et al.  Event composition in time-dependent distributed systems , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[25]  Felix C. Freiling,et al.  Formale Grundlagen der Fehlertoleranz in verteilten Systemen , 2001, Ausgezeichnete Informatikdissertationen.

[26]  David S. Rosenblum,et al.  Challenges for Distributed Event Services: Scalability vs. Expressiveness , 1999 .

[27]  Wei Tao,et al.  Information flow based event distribution middleware , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems. Workshops on Electronic Commerce and Web-based Applications. Middleware.

[28]  David S. Rosenblum,et al.  Content-Based Addressing and Routing: A General Model and its Application , 2000 .

[29]  Dongwon Lee,et al.  Conjunctive Point Predicate-based Semantic Caching for Wrappers in Web Databases , 1998, Workshop on Web Information and Data Management.

[30]  Christof Bornhövd,et al.  An infrastructure for meta-auctions , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[31]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[32]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[33]  Alejandro P. Buchmann,et al.  Integrating Notifications and Transactions: Concepts and X2TS Prototype , 2000, EDO.

[34]  Alfonso Fuggetta,et al.  Exploiting an event-based infrastructure to develop complex distributed systems , 1998, Proceedings of the 20th International Conference on Software Engineering.

[35]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[36]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[37]  Dennis Shasha,et al.  Efficient Matching for Content-based Publish/Subscribe Systems , 2000 .

[38]  Gero Mühl,et al.  Generic Constraints for {Content-Based} Publish/Subscribe Systems , 2001 .

[39]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[40]  David S. Rosenblum,et al.  Issues in supporting event-based architectural styles , 1998, ISAW '98.

[41]  Jean Bacon,et al.  Using events to build distributed applications , 1995, Second International Workshop on Services in Distributed and Networked Environments.

[42]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[43]  Alejandro P. Buchmann,et al.  An Architectural Framework für Electronic Commerce Applications , 2001, GI Jahrestagung.

[44]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[45]  Nicholas Kassem,et al.  Java 2 platform, enterprise editionアプリケーション設計ガイド , 2001 .

[46]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[47]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[48]  Luciano Baresi,et al.  Architectures for an Event Notification Service Scalable to Wide-area Networks , 2000 .

[49]  Calton Pu,et al.  Conquer: A Continual Query System for Update Monitoring in the WWW , 1999 .

[50]  Dennis Shasha,et al.  Publish/Subscribe on the Web at Extreme Speed , 2000, VLDB.

[51]  R. Nigel Horspool,et al.  Efficient type inclusion tests , 1997, OOPSLA '97.

[52]  Arthur M. Keller,et al.  A predicate-based caching scheme for client-server database architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[53]  Wei Tang Scalable Trigger Processing and Change Notification in the Continual Query System , 1999 .

[54]  Atul Prakash,et al.  Secure Distribution of Events in Content-Based Publish Subscribe Systems , 2001, USENIX Security Symposium.

[55]  Leslie Lamport,et al.  How to Write a Proof , 1995 .

[56]  A. Buchmann,et al.  Evaluation of Cooperation Models for Electronic Business , 2000 .

[57]  Rachid Guerraoui,et al.  Type-Based Publish/Subscribe , 2000 .

[58]  Saurabh Bagchi,et al.  Exactly-once delivery in a content-based publish-subscribe system , 2002, Proceedings International Conference on Dependable Systems and Networks.

[59]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[60]  Serge Abiteboul,et al.  Monitoring XML data on the Web , 2001, SIGMOD '01.

[61]  Jörg Kaiser,et al.  Implementing the real-time publisher/subscriber model on the controller area network (CAN) , 1999, Proceedings 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'99) (Cat. No.99-61702).

[62]  Jean Bacon,et al.  Using events to build large scale distributed applications , 1996, EW 7.

[63]  Alexander L. Wolf,et al.  Security issues and requirements for Internet-scale publish-subscribe systems , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[64]  Alon Y. Halevy,et al.  Theory of answering queries using views , 2000, SGMD.

[65]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[66]  Ludger Fiege,et al.  Supporting Covering and Merging in Content-Based Publish/Subscribe Systems: Beyond Name/Value Pairs , 2001 .

[67]  Marc Langheinrich,et al.  First Steps Towards an Event-Based Infrastructure for Smart Things , 2000 .

[68]  Keith L. Clark,et al.  Content-Based Routing as the Basis for Intra-Agent Communication , 1998, ATAL.

[69]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[70]  Jarek Gryz,et al.  Answering Queries by Semantic Caches , 1999, DEXA.

[71]  Elisabetta Di Nitto,et al.  Content-Based Dispatching in a Mobile Environment , 2000 .

[72]  Alejandro P. Buchmann,et al.  Filter Similarities in Content-Based Publish/Subscribe Systems , 2002, ARCS.

[73]  Dennis Shasha,et al.  Efficient Matching for Web-Based Publish/Subscribe Systems , 2000, CoopIS.

[74]  Alexander L. Wolf,et al.  Content-Based Networking: A New Communication Infrastructure , 2001, Infrastructure for Mobile and Wireless Systems.

[75]  John Edward Gough,et al.  Efficient Recognition of Events in a Distributed System , 1995 .

[76]  Felix C. Freiling,et al.  Modular event-based systems , 2002, The Knowledge Engineering Review.

[77]  Valmir Carneiro Barbosa,et al.  An introduction to distributed algorithms , 1996 .

[78]  Jean Bacon,et al.  Generic Support for Distributed Applications , 2000, Computer.

[79]  Norman W. Paton,et al.  Active Rules in Database Systems , 1998, Monographs in Computer Science.

[80]  Zohar Manna,et al.  The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.

[81]  Mira Mezini,et al.  Engineering Event-Based Systems with Scopes , 2002, ECOOP.

[82]  Dale Skeen,et al.  The Information Bus: an architecture for extensible distributed systems , 1994, SOSP '93.

[83]  Felix C. Freiling,et al.  Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments , 1999, ACM Comput. Surv..

[84]  R. Aho,et al.  Pruning Duplicate Nodes in Depth-First Search , 1993 .

[85]  Leslie Lamport,et al.  Proving the Correctness of Multiprocess Programs , 1977, IEEE Transactions on Software Engineering.

[86]  Shlomi Dolev,et al.  Self Stabilization , 2004, J. Aerosp. Comput. Inf. Commun..

[87]  Dennis Heimbigner,et al.  Adapting publish/subscribe middleware to achieve Gnutella-like functionality , 2001, SAC.

[88]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[89]  Hector Garcia-Molina,et al.  Efficient Query Subscription Processing in a Multicast Environment , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[90]  Jean Bacon,et al.  Using events for the scalable federation of heterogeneous components , 1998, ACM SIGOPS European Workshop.

[91]  Christian Heide Damm,et al.  Linguistic Support for Large-Scale Distributed Programming , 2001 .

[92]  Jean Bacon,et al.  COBEA: A CORBA-Based Event Architecture , 1998, COOTS.

[93]  Patrick Th. Eugster,et al.  Content-Based Publish/Subscribe with Structural Reflection , 2001, COOTS.

[94]  A. Pnueli The Temporal Semantics of Concurrent Programs , 1979, Theor. Comput. Sci..

[95]  Jean Bacon,et al.  Event Storage and Federation Using ODMG , 2000, POS.

[96]  David S. Rosenblum,et al.  Critical Considerations and Designs for Internet-Scale, Event-Based Compositional Architectures , 1998 .

[97]  Christoph Liebig,et al.  X²TS: Unbundling Active Object Systems , 2000 .

[98]  Alejandro P. Buchmann,et al.  An active functionality service for e-business applications , 2002, SGMD.