Reliable messaging to millions of users with migratorydata

Web-based notification services are used by a large range of businesses to selectively distribute live updates to customers, following the publish/subscribe (pub/sub) model. Typical deployments can involve millions of subscribers expecting ordering and delivery guarantees together with low latencies. Notification services must be vertically and horizontally scalable, and adopt replication to provide a reliable service. We report our experience building and operating MigratoryData, a highly-scalable notification service. We discuss the typical requirements of MigratoryData customers, and describe the architecture and design of the service, focusing on scalability and fault tolerance. Our evaluation demonstrates the ability of MigratoryData to handle millions of concurrent connections and support a reliable notification service despite server failures and network disconnections.

[1]  Fernando Pedone,et al.  Probabilistic FIFO Ordering in Publish/Subscribe Networks , 2011, 2011 IEEE 10th International Symposium on Network Computing and Applications.

[2]  Fei Hu,et al.  Quality of Service , 2014 .

[3]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[4]  Yoav Tock,et al.  SpiderCast: a scalable interest-aware overlay for topic-based pub/sub communication , 2007, DEBS '07.

[5]  Philippe Dobbelaere,et al.  Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations: Industry Paper , 2017, DEBS.

[6]  Paolo Bellavista,et al.  Quality of Service in Wide Scale Publish—Subscribe Systems , 2014, IEEE Communications Surveys & Tutorials.

[7]  Piyush Maheshwari,et al.  Benchmarking message‐oriented middleware: TIB/RV versus SonicMQ , 2005, Concurr. Pract. Exp..

[8]  Angelo CORSARO,et al.  Quality of service in publish/subscribe middleware , 2012 .

[9]  Pascal Felber,et al.  Scaling Up Publish/Subscribe Overlays Using Interest Correlation for Link Sharing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[10]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[11]  Vasaka Visoottiviseth,et al.  muMQ: A lightweight and scalable MQTT broker , 2017, 2017 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN).

[12]  Nalini Venkatasubramanian,et al.  DYNATOPS: a dynamic topic-based publish/subscribe architecture , 2013, DEBS '13.

[13]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[14]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[15]  Michael Wolf,et al.  C4: the continuously concurrent compacting collector , 2011, ISMM '11.

[16]  Christof Fetzer,et al.  StreamHub: a massively parallel architecture for high-performance content-based publish/subscribe , 2013, DEBS '13.

[17]  Reza Sherafat Kazemzadeh,et al.  The PADRES Publish/Subscribe System , 2010, Principles and Applications of Distributed Event-Based Systems.

[18]  Beihong Jin,et al.  Design and evaluation of a Pub/Sub service in the cloud , 2011, 2011 International Conference on Cloud and Service Computing.

[19]  Alex C. Snoeren,et al.  Passive Realtime Datacenter Fault Detection and Localization , 2017, NSDI.