Relevance Matters: Capitalizing on Less (Top-k Matching in Publish/Subscribe)

The efficient processing of large collections of Boolean expressions plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. Emerging applications such as computational advertising and selective information dissemination demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target devices. To retrieve the most relevant content, we present BE*-Tree, a novel indexing data structure designed for effective hierarchical top-k pattern matching, which as its by-product also reduces the operational cost of processing millions of patterns. To further reduce processing cost, BE*-Tree employs an adaptive and non-rigid space-cutting technique designed to efficiently index Boolean expressions over a high-dimensional continuous space. At the core of BE*-Tree lie two innovative ideas: (1) a bi-directional tree expansion build as a top-down (data and space clustering) and a bottom-up growths (space clustering), which together enable indexing only non-empty continuous sub-spaces, and (2) an overlap-free splitting strategy. Finally, the performance of BE*-Tree is proven through a comprehensive experimental comparison against state-of-the-art index structures for matching Boolean expressions.

[1]  Hans-Arno Jacobsen,et al.  BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space , 2011, SIGMOD '11.

[2]  Ashwin Machanavajjhala,et al.  Scalable ranked publish/subscribe , 2008, Proc. VLDB Endow..

[3]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[4]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[5]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[6]  Sergei Vassilvitskii,et al.  Efficiently evaluating complex boolean expressions , 2010, SIGMOD Conference.

[7]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[8]  Helmut Veith,et al.  Efficient filtering in publish-subscribe systems using binary decision diagrams , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[9]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[10]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[11]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[12]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[13]  Michael Freeston A general solution of the n-dimensional B-tree problem , 1995, SIGMOD '95.

[14]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[15]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.