Enabling Elastic Stream Processing in Shared Clusters

Distributed data stream processing has become an increasingly popular computational framework due to many emerging applications which require real-time processing of data such as dynamic content delivery and security event analysis. These distributed data stream processing applications are often run on shared, multi-tenant clusters as companies try to consolidate from dedicated clusters for each application (batch and streaming) to a single cluster using a global cluster manager such as Hadoop YARN. In shared cluster environments, guaranteeing the quality of service constraints for throughput and response time for both stream processing applications and batch applications is a significant challenge. Stream processing applications often face an elastic demand where the input rate can vary drastically. The typical solution to solve workload elasticity is to guarantee enough resources to the application, but this solution is not possible when resources are being shared among multiple applications. In this paper, we present an approach for supporting elastic scaling of distributed data stream processing applications and efficiently scheduling and coordinating stream processing with batch processing in shared clusters. Our solution consists of a congestion detection monitor which detects bottlenecks in the streaming system and a global state manager that performs non-disruptive, stateful scaling of streaming applications. We implemented our solution using Storm, a popular stream processing framework, and tested our implementation on a Hadoop YARN cluster using a real-time security event processing workload. Our experimental results show that our solution improves stream processing application throughput by 49% over default Storm while decreasing average request response times by 58%.

[1]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[2]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[3]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[4]  Thomas S. Heinze,et al.  Online parameter optimization for elastic data stream processing , 2015, SoCC.

[5]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[6]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[7]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[8]  Wenfei Fan,et al.  Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data , 2014 .

[9]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[10]  Robert J. Meijer,et al.  Dynamically Scaling Apache Storm for the Analysis of Streaming Data , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[11]  Calton Pu,et al.  Improving Preemptive Scheduling with Application-Transparent Checkpointing in Shared Clusters , 2015, Middleware.

[12]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Christof Fetzer,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[14]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[15]  Daniel Gmach,et al.  Distributed Real-Time Event Analysis , 2015, 2015 IEEE International Conference on Autonomic Computing.

[16]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[17]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[18]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[19]  Richard T. B. Ma,et al.  Smooth Task Migration in Apache Storm , 2015, SIGMOD Conference.

[20]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.