PhSIH: A Lightweight Parallelization of Event Matching in Content-based Pub/Sub Systems

The matching algorithm is a critical component of the content-based publish/subscribe system, whose performance has direct effects on the QoS of the whole system. Aiming to improve and stabilize the matching performance, we propose a lightweight parallelization method called PhSIH on the basis of three existing algorithms. PhSIH fulfills Parallelization by horizontally Segmenting the Indexing Hierarchy of data structures to support multiple threads performing matching tasks in parallel on a common data structure. PhSIH can adaptively adjust the degree of parallelism according to the changing workloads in order to meet the performance requirement. The main work of PhSIH concerns dynamically adjusting the degree of parallelism and computing a task allocation solution for parallel threads. PhSIH is implemented in Apache Kafka to augment it as a content-based publish/subscribe system, which makes Kafka suitable for real-time fine-grained event dissemination scenarios, such as stock ticks. To evaluate the parallelization effect and adaptability of PhSIH, a series of experiments are conducted based on synthetic and real-world data. The experiment results demonstrate that PhSIH achieves a good parallelization effect on the three existing algorithms and possesses a desirable adaptability that stabilizes the performance of the matching algorithms.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Hans-Arno Jacobsen,et al.  Analysis and optimization for boolean expression indexing , 2013, TODS.

[3]  Minglu Li,et al.  H-Tree: An Efficient Index Structurefor Event Matching in Content-BasedPublish/Subscribe Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Helmut Veith,et al.  Efficient filtering in publish-subscribe systems using binary decision diagrams , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[5]  Hans-Arno Jacobsen,et al.  A Unified Approach to Routing, Covering and Merging in Publish/Subscribe Systems Based on Modified Binary Decision Diagrams , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[6]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[7]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[8]  Hans-Arno Jacobsen,et al.  BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space , 2011, SIGMOD '11.

[9]  Hans-Arno Jacobsen,et al.  Towards highly parallel event processing through reconfigurable hardware , 2011, DaMoN '11.

[10]  Alessandro Margara,et al.  High performance content-based matching using GPUs , 2011, DEBS '11.

[11]  Nalini Venkatasubramanian,et al.  MICS: an efficient content space representation model for publish/subscribe systems , 2009, DEBS '09.

[12]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[13]  Christof Fetzer,et al.  Bloom filter based routing for content-based publish/subscribe , 2008, DEBS.

[14]  Jie Wu,et al.  Towards Approximate Event Processing in a Large-Scale Content-Based Network , 2011, 2011 31st International Conference on Distributed Computing Systems.

[15]  Yuanan Liu,et al.  GEM: An analytic geometrical approach to fast event matching for multi-dimensional content-based publish/subscribe services , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[16]  Yuanan Liu,et al.  DEXIN: A fast content-based multi-attribute event matching algorithm using dynamic exclusive and inclusive methods , 2017, Future Gener. Comput. Syst..

[17]  Jianwei Yin,et al.  Parallel Matching Algorithms of Publish/Subscribe System , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[18]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[19]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[20]  Alessandro Margara,et al.  High-Performance Publish-Subscribe Matching Using Parallel Hardware , 2014, IEEE Transactions on Parallel and Distributed Systems.

[21]  Alex Delis,et al.  Using the graphics processor unit to realize data streaming operations , 2009, MDS '09.

[22]  Minglu Li,et al.  REIN: A fast event matching approach for content-based publish/subscribe systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[23]  David S. Rosenblum,et al.  A design framework for Internet-scale event observation and notification , 1997, ESEC '97/FSE-5.

[24]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[25]  Minglu Li,et al.  A fast and anti-matchability matching algorithm for content-based publish/subscribe systems , 2019, Comput. Networks.

[26]  Hans-Arno Jacobsen,et al.  Parallel event processing for content-based publish/subscribe systems , 2009, DEBS '09.

[27]  TanKian-Lee,et al.  An efficient publish/subscribe index for e-commerce databases , 2014, VLDB 2014.