Message Latency-Based Load Shedding Mechanism in Apache Kafka

Apache Kafka is a distributed message queuing platform that delivers data streams in real time. Through the distributed processing technology, Kafka has the advantage of delivering very large data streams very fast. However, when the data explosion occurs, the message latency largely increases and the system might be interrupted. This paper proposes a load shedding engine of Kafka that solves this message latency problem. The load shedding engine solves the data explosion problem by introducing a simple mechanism that restricts the transmission of some messages when the latency exceeds the given threshold in the Kafka’s producer. Experiments with Apache Storm-based real-time applications show that the latency does not continuously increase due to the load shedding function in both single and multiple data streams, and maintains a constant level. This is the first attempt to apply a load shedding technique to Kafka-based real-time stream processing, providing simple and efficient data explosion control.

[1]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[2]  Leandros Tassiulas,et al.  Distributed load shedding with minimum energy , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[3]  Alexandros Labrinidis,et al.  Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[4]  Brian O'Neill,et al.  Storm blueprints : patterns for distributed real-time computation : use Storm design patterns to perform distributed, real-time big data processing, and analytics for real-world use cases , 2014 .

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Michael Stonebraker,et al.  Load Shedding on Data Streams , 2003 .

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[10]  Leonardo Querzoni,et al.  Load-aware shedding in stream processing systems , 2016, DEBS.

[11]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.