E-Storm: Replication-Based State Management in Distributed Stream Processing Systems

Apache Storm is a fault-tolerant, distributed inmemory computation system for processing large volumes of high-velocity data in real-time. As an integral part of the fault-tolerance mechanism, Storm's state management is achieved by a checkpointing framework, which commits states regularly and recovers lost states from the latest checkpoint. However, this method involves a remote data store for state preservation and access, resulting in significant overheads to the performance of error-free execution.In this paper, we propose E-Storm, a replication-based state management system that actively maintains multiple state backups on different worker nodes. We build a prototype on top of Storm by extending it with monitoring and recovery modules to support inter-task state transfer whenever needed. The experiments carried out on synthetic and real-world streaming applications confirm that E-Storm outperforms the existing checkpointing method in terms of the resulting application performance, obtaining as much as 9.44 times throughput improvement while reducing the application latency down to 9.8%.

[1]  Matthias Weidlich,et al.  Scalable stateful stream processing for smart grids , 2014, DEBS '14.

[2]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[3]  Valeria Cardellini,et al.  Elastic stateful stream processing in storm , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[4]  Odej Kao,et al.  Nephele streaming: stream processing under QoS constraints at scale , 2013, Cluster Computing.

[5]  Kian-Lee Tan,et al.  ChronoStream: Elastic stateful stream computation in the cloud , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[6]  Andrey Brito,et al.  Active Replication at (Almost) No Cost , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[7]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[8]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[10]  Thomas S. Heinze,et al.  An adaptive replication scheme for elastic data stream processing systems , 2015, DEBS.

[11]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[12]  Wei Lin,et al.  StreamScope: Continuous Reliable Distributed Processing of Big Data Streams , 2016, NSDI.

[13]  Tim Kraska,et al.  Stormy: an elastic and highly available streaming service in the cloud , 2012, EDBT-ICDT '12.

[14]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[15]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[16]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[17]  Goetz Graefe,et al.  Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[18]  Joseph M. Hellerstein,et al.  Highly available fault-tolerant , 2004 .

[19]  Yin Yang,et al.  Efficient Operator State Migration for Cloud-Based Data Stream Management Systems , 2015 .

[20]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.