Dhalion: Self-Regulating Stream Processing in Heron

In recent years, there has been an explosion of large-scale real-time analytics needs and a plethora of streaming systems have been developed to support such applications. These systems are able to continue stream processing even when faced with hardware and software failures. However, these systems do not address some crucial challenges facing their operators: the manual, time-consuming and error-prone tasks of tuning various configuration knobs to achieve service level objectives (SLO) as well as the maintenance of SLOs in the face of sudden, unpredictable load variation and hardware or software performance degradation. In this paper, we introduce the notion of self-regulating streaming systems and the key properties that they must satisfy. We then present the design and evaluation of Dhalion, a system that provides self-regulation capabilities to underlying streaming systems. We describe our implementation of the Dhalion framework on top of Twitter Heron, as well as a number of policies that automatically reconfigure Heron topologies to meet throughput SLOs, scaling resource consumption up and down as needed. We experimentally evaluate our Dhalion policies in a cloud environment and demonstrate their effectiveness. We are in the process of open-sourcing our Dhalion policies as part of the Heron project.

[1]  Tanakorn Leesatapornwongsa,et al.  Limplock: understanding the impact of limpware on scale-out cloud systems , 2013, SoCC.

[2]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[3]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[4]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[5]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[6]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[7]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[8]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[9]  Yin Yang,et al.  DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[10]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[11]  Alexandros Labrinidis,et al.  Avoiding class warfare: managing continuous queries with differentiated classes of service , 2015, The VLDB Journal.

[12]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[13]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[14]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[15]  Cong Wang,et al.  Twitter Heron: Towards Extensible Streaming Engines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[16]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[17]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.