PS-Tree-based Efficient Boolean Expression Matching for High Dimensional and Dense Workloads

Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on PS-Tree, we first propose a Boolean expression matching algorithm PSTBloom. By efficiently filtering out a large proportion of unmatching subscriptions, PSTBloom achieves high matching performance, especially for high-dimensional workloads. PSTBloom also achieves fast index construction and a small memory footprint. Compared with state-of-theart methods, comprehensive experiments show that PSTBloom reduces matching time, index construction time and memory usage by up to 84%, 78% and 94%, respectively. Although PSTBloom is effective for many workload distributions, dense workloads represent new challenges to PSTBloom and other algorithms. To effectively handle dense workloads, we further propose the PSTHash algorithm, which divides subscriptions into disjoint multidimensional predicate spaces. This organization prunes partially matching subscriptions efficiently. Comprehensive experiments on both synthetic and real-world datasets show that PSTHash improves the matching performance by up to 92% for dense workloads. PVLDB Reference Format: Shuping Ji, Hans-Arno Jacobsen. PS-Tree-Based Efficient Boolean Expression Matching for High-Dimensional and Dense Workloads. PVLDB, 12(3): 251-264, 2018. DOI: https://doi.org/10.14778/3291264.3291270

[1]  Hans-Arno Jacobsen,et al.  Routing of XML and XPath Queries in Data Dissemination Networks , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[2]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[3]  Alessandro Margara,et al.  High-Performance Publish-Subscribe Matching Using Parallel Hardware , 2014, IEEE Transactions on Parallel and Distributed Systems.

[4]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[5]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[6]  Hans-Arno Jacobsen,et al.  Analysis and optimization for boolean expression indexing , 2013, TODS.

[7]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[8]  Jennifer Widom,et al.  Practical Applications of Triggers and Constraints: Successes and Lingering Issues , 2000 .

[9]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[10]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[11]  Hans-Arno Jacobsen,et al.  BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space , 2011, SIGMOD '11.

[12]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[13]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[14]  Minglu Li,et al.  REIN: A fast event matching approach for content-based publish/subscribe systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[15]  Hans-Arno Jacobsen,et al.  The PADRES Distributed Publish/Subscribe System , 2005, FIW.

[16]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[17]  TanKian-Lee,et al.  An efficient publish/subscribe index for e-commerce databases , 2014, VLDB 2014.

[18]  Helmut Veith,et al.  Efficient filtering in publish-subscribe systems using binary decision diagrams , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[19]  Sergei Vassilvitskii,et al.  Efficiently evaluating complex boolean expressions , 2010, SIGMOD Conference.

[20]  Hans-Arno Jacobsen,et al.  Predicate-based Filtering of XPath Expressions , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[22]  Ashwin Machanavajjhala,et al.  Scalable ranked publish/subscribe , 2008, Proc. VLDB Endow..

[23]  Yuanan Liu,et al.  GEM: An analytic geometrical approach to fast event matching for multi-dimensional content-based publish/subscribe services , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[24]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[25]  Jörg Kienzle,et al.  Publish/subscribe network designs for multiplayer games , 2014, Middleware.

[26]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[27]  Hans-Arno Jacobsen,et al.  A distributed service-oriented architecture for business process execution , 2010, TWEB.

[28]  Hans-Arno Jacobsen,et al.  Safe Distribution and Parallel Execution of Data-Centric Workflows over the Publish/Subscribe Abstraction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[29]  Minglu Li,et al.  H-Tree: An Efficient Index Structurefor Event Matching in Content-BasedPublish/Subscribe Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[30]  Emin Gün Sirer,et al.  Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews , 2005, IMC '05.

[31]  Hans-Arno Jacobsen,et al.  GPX-matcher: a generic boolean predicate-based XPath expression matcher , 2011, EDBT/ICDT '11.

[32]  Badrish Chandramouli,et al.  End-to-end support for joins in large-scale publish/subscribe systems , 2008, Proc. VLDB Endow..

[33]  Jun Wei,et al.  MERC: Match at Edge and Route intra--Cluster for Content-based Publish/Subscribe Systems , 2015, Middleware.

[34]  Lan Huang,et al.  Scalable trigger processing , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[35]  Rene De La Briandais File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[36]  Jie Wu,et al.  Towards Approximate Event Processing in a Large-Scale Content-Based Network , 2011, 2011 31st International Conference on Distributed Computing Systems.

[37]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[38]  Hans-Arno Jacobsen,et al.  Adaptive parallel compressed event matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.