A Preventive Auto-Parallelization Approach for Elastic Stream Processing

Nowadays, more and more sources (connected devices, social networks, etc.) emit real-time data with fluctuating rates over time. Existing distributed stream processing engines (SPE) have to resolve a difficult problem: deliver results satisfying end-users in terms of quality and latency without over-consuming resources. This paper focuses on parallelization of operators to adapt their throughput to their input rate. We suggest an approach which prevents operator congestion in order to limit degradation of results quality. This approach relies on an automatic and dynamic adaptation of resource consumption for each continuous query. This solution takes advantage of i) a metric estimating the activity level of operators in the near future ii) the AUTOSCALE approach which evaluates the need to modify parallelism degrees at local and global scope iii) an integration into the Apache Storm solution. We show performance tests comparing our approach to the native solution of this SPE.

[1]  Indranil Gupta,et al.  Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[2]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[3]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[4]  Kai-Uwe Sattler,et al.  Towards Elastic Stream Processing: Patterns and Infrastructure , 2013, BD3@VLDB.

[5]  Bruno Sericola,et al.  Efficient key grouping for near-optimal load balancing in stream processing systems , 2015, DEBS.

[6]  Rajarshi Das,et al.  Model-Based and Model-Free Approaches to Autonomic Resource Allocation , 2005 .

[7]  Matthias Weidlich,et al.  Queue Mining - Predicting Delays in Service Processes , 2014, CAiSE.

[8]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[9]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[10]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[11]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[12]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[13]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[14]  Scott Shenker,et al.  Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing , 2012 .

[15]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[16]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Christof Fetzer,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.