An Implementation of Content-Based Pub/Sub System via Stream Computation

The sheer volume of data delivered via the Internet requires a more flexible and powerful communication model. As an expressive loosely-coupled asynchronous messaging model, Publish-Subscribe (Pub/Sub) system has been widely used. Traditional topic-based Pub/Sub system fails to understand the information of messages delivered, all messages must be previously classified into a set of topics. Content-based Pub/Sub system can dynamically choose subscribers for each message by its metadata. Existing distributed Pub/Sub systems are built on the overlay network consists of message brokers, which can adapt to heterogeneous network but inevitably impairs performance. In this paper, we designed a novel centralized tiered content-based Pub/Sub system with a four-layer architecture. In access layer, a customized naming strategy is proposed to achieve high availability. Internal message routing is finished in routing layer and sharding scheme is used to lower routing overhead. In computation layer, a two-step streaming computation model is used to boost the performance. In storage layer we adopt column-oriented database HBase for persistence. A set of comprehensive experiments were conduct to verify that our system achieve excellent performance, linear scalability and high availability.

[1]  Steven McCanne,et al.  An evaluation of preference clustering in large-scale multicast applications , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[2]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[3]  Andrea C. Arpaci-Dusseau,et al.  Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.

[4]  Guruduth Banavar,et al.  An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[5]  Helen J. Wang,et al.  Subscription Partitioning and Routing in Content-based Publish/Subscribe Systems , 2005 .

[6]  Guruduth Banavar,et al.  Gryphon: An Information Flow Based Approach to Message Brokering , 1998, ArXiv.

[7]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[8]  Lihua Li,et al.  A RCP-Based Congestion Control Protocol in Named Data Networking , 2015, 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[9]  Jaswinder Pal Singh,et al.  MEDYM: Match-Early with Dynamic Multicast for Content-Based Publish-Subscribe Networks , 2005, Middleware.