H-Tree: An efficient index structure for event matching in publish/subscribe systems

Content-based publish/subscribe systems have been employed to deal with complex distributed information flows in many applications. It is well recognized that event matching is a fundamental component of such large-scale systems. Event matching is to search in a space which is composed of all subscriptions. As the scale and complexity of a system grow, the efficiencies of event matching become more critical to the system performance. Most existing methods suffer performance degradation problem when a system has both large number of subscriptions and large number of constraints. In this paper, we present H-Tree (Hash Tree), a highly efficient index structure for event matching. H-Tree is a hash table in nature which is a combination of hash lists and hash chaining. A hash list is realized on an indexed attribute by dividing the attribute's value domain into cells. Multiple hash lists are chained into a hash tree. The basic idea behind H-Tree is that matching efficiencies are improved when the search space is substantially reduced by pruning most of the impossible subscriptions. We have implemented H-Tree and conducted extensive experiments in different settings. Experimental results show that H-Tree outperforms its counterparts to a large degree. In particular, the matching time is faster by three order of magnitude than its counterparts when both the number of subscriptions and the number of constraints are large.

[1]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[2]  Hans-Arno Jacobsen,et al.  A Unified Approach to Routing, Covering and Merging in Publish/Subscribe Systems Based on Modified Binary Decision Diagrams , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[3]  Sergei Vassilvitskii,et al.  Efficiently evaluating complex boolean expressions , 2010, SIGMOD Conference.

[4]  Christof Fetzer,et al.  Bloom filter based routing for content-based publish/subscribe , 2008, DEBS.

[5]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[6]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[7]  Jie Wu,et al.  Towards Approximate Event Processing in a Large-Scale Content-Based Network , 2011, 2011 31st International Conference on Distributed Computing Systems.

[8]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[9]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[10]  Jun Wei,et al.  Efficient Event Matching in Publish/subscribe: Based on Routing Destination and Matching History , 2008, 2008 International Conference on Networking, Architecture, and Storage.

[11]  Alfonso Fuggetta,et al.  The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS , 2001, IEEE Trans. Software Eng..

[12]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[13]  Patrick Th. Eugster,et al.  Split and Subsume: Subscription Normalization for Effective Content-Based Messaging , 2011, 2011 31st International Conference on Distributed Computing Systems.

[14]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[15]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[16]  Karl Aberer,et al.  Fast Probabilistic Subsumption Checking for Publish/Subscribe Systems , 2006 .

[17]  Hans-Arno Jacobsen,et al.  Load Balancing Content-Based Publish/Subscribe Systems , 2010, TOCS.

[18]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[19]  Dong Jian Efficient Matching for Content-Based Publish-Subscribe Systems , 2006 .

[20]  Peter R. Pietzuch,et al.  Hermes: a distributed event-based middleware architecture , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[21]  Eric N. Hanson,et al.  A predicate matching algorithm for database rule systems , 1990, SIGMOD '90.

[22]  Peter Triantafillou,et al.  Subscription summarization: a new paradigm for efficient publish/subscribe systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[23]  Helmut Veith,et al.  Efficient filtering in publish-subscribe systems using binary decision diagrams , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[24]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[25]  Nalini Venkatasubramanian,et al.  MICS: an efficient content space representation model for publish/subscribe systems , 2009, DEBS '09.