SCSL: Optimizing Matching Algorithms to Improve Real-time for Content-based Pub/Sub Systems

Although many matching algorithms have been proposed to improve the matching efficiency of the content-based publish/subscribe system, existing work seldom consider the real-time of event dissemination from the perspective of event matching. On the basis of two existing matching algorithms, in this paper, we propose a subscription-classifying and structure-layering (SCSL) optimization method for matching algorithms, aiming to improve real-time by shortening the determining time of matching subscriptions. The basic idea of SCSL is that subscriptions with high matching probabilities should be processed first in the process of event matching and their storage positions in the data structure should be adjusted in line with changing probabilities. One challenge of SCSL is the trade-off that needs to be made between the gains of improving real-time performance by identifying matching subscriptions earlier and the cost of increasing matching time due to subscription classification and adjustment. We design a concise scheme to classify subscriptions, establish a lightweight adjustment mechanism to deal with dynamics and propose an efficient greedy algorithm to compute the adjustment solution, which alleviates the impact of SCSL on matching performance. The experiment results show that the 95th percentile of the determining time of matching subscriptions is improved by about 70%. Furthermore, we integrate SCSL into Apache Kafka to augment it as a content-based publish/subscribe system and test the effect of SCSL based on real-world stock trace data, which witnesses about 40% improvement on the average event transfer latency and confirms that SCSL can effectively improve the real-time performance of content-based publish/subscribe systems.

[1]  TanKian-Lee,et al.  An efficient publish/subscribe index for e-commerce databases , 2014, VLDB 2014.

[2]  Yunpeng Xiao,et al.  Social hotspot propagation dynamics model based on heterogeneous mean field and evolutionary games , 2018, Physica A: Statistical Mechanics and its Applications.

[3]  Mark Buchanan,et al.  Physics in finance: Trading at the speed of light , 2015, Nature.

[4]  Yuanan Liu,et al.  GEM: An analytic geometrical approach to fast event matching for multi-dimensional content-based publish/subscribe services , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[5]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[6]  Hans-Arno Jacobsen,et al.  PS-Tree-based Efficient Boolean Expression Matching for High Dimensional and Dense Workloads , 2018, Proc. VLDB Endow..

[7]  Minglu Li,et al.  Adjusting Matching Algorithm to Adapt to Workload Fluctuations in Content-based Publish/Subscribe Systems , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[8]  Minglu Li,et al.  PhSIH: A Lightweight Parallelization of Event Matching in Content-based Pub/Sub Systems , 2019, ICPP.

[9]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[10]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[11]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[12]  Ildar Z. Batyrshin,et al.  Analysis of relationships between tweets and stock market trends , 2018, J. Intell. Fuzzy Syst..

[13]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[14]  Minglu Li,et al.  H-Tree: An Efficient Index Structurefor Event Matching in Content-BasedPublish/Subscribe Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[15]  Hans-Arno Jacobsen,et al.  Efficient event processing through reconfigurable hardware for algorithmic trading , 2010, Proc. VLDB Endow..

[16]  Nalini Venkatasubramanian,et al.  MICS: an efficient content space representation model for publish/subscribe systems , 2009, DEBS '09.

[17]  Minglu Li,et al.  REIN: A fast event matching approach for content-based publish/subscribe systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[18]  David S. Rosenblum,et al.  A design framework for Internet-scale event observation and notification , 1997, ESEC '97/FSE-5.

[19]  Jie Wu,et al.  Towards Approximate Event Processing in a Large-Scale Content-Based Network , 2011, 2011 31st International Conference on Distributed Computing Systems.

[20]  Hans-Arno Jacobsen,et al.  BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space , 2011, SIGMOD '11.

[21]  Minglu Li,et al.  Towards prioritized event matching in a content-based publish/subscribe system , 2015, DEBS.