H-Tree: An Efficient Index Structurefor Event Matching in Content-BasedPublish/Subscribe Systems

Content-based publish/subscribe systems have been employed to deal with complex distributed information flows in many applications. It is well recognized that event matching is a fundamental component of such large-scale systems. Event matching searches a space which is composed of all subscriptions. As the scale and complexity of a system grows, the efficiency of event matching becomes more critical to system performance. However, most existing methods suffer significant performance degradation when the system has large numbers of both subscriptions and their component constraints. In this paper, we present Hash Tree (H-Tree), a highly efficient index structure for event matching. H-Tree is a hash table in nature that is a combination of hash lists and hash chaining. A hash list is built up on an indexed attribute by realizing novel overlapping divisions of the attribute's value domain, providing more efficient space consumption. Multiple hash lists are then combined into a hash tree. The basic idea behind H-Tree is that matching efficiencies are improved when the search space is substantially reduced by pruning most of the subscriptions that are not matched. We have implemented H-Tree and conducted extensive experiments in different settings. Experimental results demonstrate that H-Tree has better performance than its counterparts by a large margin. In particular, the matching speed is faster by three orders of magnitude than its counterparts when the numbers of both subscriptions and their component constraints are huge.

[1]  Hans-Arno Jacobsen,et al.  Predicate matching and subscription matching in Publish/Subscribe systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[2]  Jie Wu,et al.  Towards Approximate Event Processing in a Large-Scale Content-Based Network , 2011, 2011 31st International Conference on Distributed Computing Systems.

[3]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[4]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[5]  Jun Wei,et al.  Efficient Event Matching in Publish/subscribe: Based on Routing Destination and Matching History , 2008, 2008 International Conference on Networking, Architecture, and Storage.

[6]  Minglu Li,et al.  H-Tree: An efficient index structure for event matching in publish/subscribe systems , 2013, 2013 IFIP Networking Conference.

[7]  Peter Triantafillou,et al.  Subscription summarization: a new paradigm for efficient publish/subscribe systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[8]  Helmut Veith,et al.  Efficient filtering in publish-subscribe systems using binary decision diagrams , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[9]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[10]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[11]  Nalini Venkatasubramanian,et al.  MICS: an efficient content space representation model for publish/subscribe systems , 2009, DEBS '09.

[12]  Patrick Th. Eugster,et al.  Split and Subsume: Subscription Normalization for Effective Content-Based Messaging , 2011, 2011 31st International Conference on Distributed Computing Systems.

[13]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[14]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[15]  Hans-Arno Jacobsen,et al.  Load Balancing Content-Based Publish/Subscribe Systems , 2010, TOCS.

[16]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[17]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[18]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[19]  Sergei Vassilvitskii,et al.  Efficiently evaluating complex boolean expressions , 2010, SIGMOD Conference.

[20]  Dennis Shasha,et al.  Efficient Matching for Content-based Publish/Subscribe Systems , 2000 .

[21]  Dong Jian Efficient Matching for Content-Based Publish-Subscribe Systems , 2006 .

[22]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[23]  Hao Yang,et al.  Scalable event matching for overlapping subscriptions in pub/sub systems , 2007, DEBS '07.

[24]  Jaroslav Pokorný,et al.  Efficient Processing of Narrow Range Queries in the R-Tree , 2006 .

[25]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[26]  Christof Fetzer,et al.  Bloom filter based routing for content-based publish/subscribe , 2008, DEBS.

[27]  Srikanta Tirthapura,et al.  Approximate covering detection among content-based subscriptions using space filling curves , 2012, J. Parallel Distributed Comput..

[28]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[29]  Hans-Arno Jacobsen,et al.  A Unified Approach to Routing, Covering and Merging in Publish/Subscribe Systems Based on Modified Binary Decision Diagrams , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[30]  Karl Aberer,et al.  Efficient Probabilistic Subsumption Checking for Content-Based Publish/Subscribe Systems , 2006, Middleware.