Towards In-Order and Exactly-Once Delivery Using Hierarchical Distributed Message Queues

In today's world, distributed message queues are used in many systems and play different roles (e.g. content delivery, notification system and message delivery tools). It is important for the queue services to be able to deliver messages at large scales with a variety of message sizes with high concurrency. An example of a commercial state of the art distributed message queue is Amazon Simple Queuing Service (SQS). SQS is a distributed message delivery fabric that is highly scalable. It can queue unlimited number of short messages (maximum size: 256 KB) and deliver them to multiple users in parallel. In order to be able to provide such high throughput at large scales, SQS omits some of features that are provided by traditional queues. SQS does not guarantee the order of the messages, nor does it guarantee the exactly once delivery. This paper addresses these limitations through the design and implementation of HDMQ, a hierarchical distributed message queue. HDMQ consist of collection of area message nodes that can be used to store messages up to 512 KB. It utilizes a round robin local load balancer to save the message and scale across the area region accordingly. HDMQ provides replication for high reliability of messages. It also provides SQS-like APIs in order to provide compatibility with current systems that currently use SQS. We performed a detailed performance evaluation and compared HDMQ to the commonly used commercial distributed queues measuring throughput, latency and price per request. We found HDMQ to outperform SQS, Windows Azure Service bus, and Iron MQ by up to 2-15x times in throughput, 1.6-39x times in latency, and all this for 13%-80% less costs.

[1]  Ioan Raicu,et al.  Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing , 2009 .

[2]  Douglas Thain,et al.  Towards Data Intensive Many-Task Computing , 2012 .

[3]  Ke Wang,et al.  ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[4]  Samuel Kounev,et al.  Benchmarking of message-oriented middleware , 2009, DEBS '09.

[5]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[6]  Ioan Raicu,et al.  Supporting Large Scale Data-Intensive Computing with the FusionFS Distributed File System , 2013 .

[7]  Ioan Raicu,et al.  CloudKon: a CLOUD-enabled distributed tasK executiON framework , 2013 .

[8]  Michael Menth,et al.  Throughput Performance of the ActiveMQ JMS Server , 2007, KiVS.

[9]  Yong Zhao,et al.  Opportunities and Challenges in Running Scientific Workflows on the Cloud , 2011, 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[10]  Yong Zhao,et al.  Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments , 2008, ArXiv.

[11]  Zhao Zhang,et al.  Paving the Road to Exascale with Many-Task Computing , 2013 .

[12]  Tejaswi Redkar,et al.  Windows Azure platform Overview , 2011 .

[13]  Raouf Boutaba,et al.  Cloud computing: state-of-the-art and research challenges , 2010, Journal of Internet Services and Applications.

[14]  Ke Wang,et al.  Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[15]  Anupam Rajendran MATRIX: MANY-TASK COMPUTING EXECUTION FABRIC FOR EXTREME SCALES , 2013 .

[16]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .

[17]  Alexander S. Szalay,et al.  Middleware support for many-task computing , 2010, Cluster Computing.

[18]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .

[19]  Michael Menth,et al.  Impact of Complex Filters on the Message Throughput of the ActiveMQ JMS Server , 2007, ITC.

[20]  Zhao Zhang,et al.  Extreme-scale scripting: Opportunities for large task-parallel applications on petascale computers , 2009 .

[21]  Tevfik Kosar Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management , 2012 .

[22]  Rob Davies,et al.  ActiveMQ in Action , 2011 .

[23]  John Keeney,et al.  Building a Scalable Event Processing System with Messaging and Policies – Test and Evaluation of RabbitMQ and Drools Expert , 2013 .