End-to-end support for joins in large-scale publish/subscribe systems

We address the problem of supporting a large number of select-join subscriptions for wide-area publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that group-processes and disseminates a general mix of multi-way select-join subscriptions. We also propose a simple and application-agnostic extension to content-driven networks (CN), which further improves sharing of dissemination costs. Experimental evaluations show that our schemes can generate orders of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency.

[1]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[2]  Badrish Chandramouli,et al.  ProSem: scalable wide-area publish/subscribe , 2008, SIGMOD Conference.

[3]  Olga Papaemmanouil,et al.  SemCast: semantic multicast for content-based data dissemination , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[5]  Cédric du Mouza,et al.  SD-Rtree: A Scalable Distributed Rtree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Danny Kopec,et al.  Additional References , 2003 .

[7]  Johannes Gehrke,et al.  Massively multi-query join processing in publish/subscribe systems , 2007, SIGMOD '07.

[8]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[9]  Robert E. Strom,et al.  Relational subscription middleware for Internet-scale publish-subscribe , 2003, DEBS '03.

[10]  Yanlei Diao,et al.  Towards an Internet-Scale XML Dissemination Service , 2004, VLDB.

[11]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[12]  G. Weikum Querying the Internet with PIER , 2005 .

[13]  Divyakant Agrawal,et al.  Meghdoot: Content-Based Publish/Subscribe over P2P Networks , 2004, Middleware.

[14]  Alfons Kemper,et al.  Integrating semi-join-reducers into state-of-the-art query processors , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Badrish Chandramouli,et al.  On the database/network interface in large-scale publish/subscribe systems , 2006, SIGMOD Conference.

[16]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[17]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[18]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[19]  Pascal Felber,et al.  A scalable protocol for content-based routing in overlay networks , 2003, Second IEEE International Symposium on Network Computing and Applications, 2003. NCA 2003..

[20]  Pankaj K. Agarwal,et al.  Scalable continuous query processing by tracking hotspots , 2006, VLDB.

[21]  Hans-Arno Jacobsen,et al.  The PADRES Distributed Publish/Subscribe System , 2005, FIW.

[22]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[23]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[24]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[25]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[26]  Michael J. Franklin,et al.  PSoup: a system for streaming queries over streaming data , 2003, The VLDB Journal.

[27]  Nam Huyn Speeding up View Maintenance Using Cheap Filters at the Warehouse , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[28]  Alexander L. Wolf,et al.  Content-Based Networking: A New Communication Infrastructure , 2001, Infrastructure for Mobile and Wireless Systems.

[29]  Badrish Chandramouli,et al.  Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems , 2007, VLDB.

[30]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[31]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[32]  Ugur Çetintemel,et al.  Locality Aware Networked Join Evaluation , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[33]  Joshua S. Auerbach,et al.  Exploiting IP Multicast in Content-Based Publish-Subscribe Systems , 2000, Middleware.

[34]  Elke A. Rundensteiner,et al.  Cost-driven general join view maintenance over distributed data sources , 2005, 21st International Conference on Data Engineering (ICDE'05).

[35]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[36]  Manolis Koubarakis,et al.  Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[37]  Walter Willinger,et al.  Towards capturing representative AS-level Internet topologies , 2002, SIGMETRICS '02.

[38]  Gero Mühl Generic Constraints for Content-Based Publish/Subscribe , 2001, CoopIS.