Pec: Proactive Elastic Collaborative Resource Scheduling in Data Stream Processing

In the Distributed Parallel Stream Processing Systems (DPSPS), elastic resource allocation allows applications to dynamically response to workload fluctuations. However, resource provisioning can be particularly challenging, due to the unpredictability of the workload. In addition, unlike CPU resources, bandwidth resources are often ignored in resource allocation. Moreover, resource allocation and resource placement are considered separately. In this paper, we investigate the proactive elastic resource scheduling problem for computation-intensive and communication-intensive applications, which aims at meeting the latency requirement with the minimal energy cost, and propose a dynamic collaborative strategy from the systemic perspective. Specifically, we first model a collaborative workload prediction pattern to accurately predict the upcoming workload, and construct a latency estimation model to estimate the latency of the application. Then, we design an energy-efficient resource pre-allocation method, in which the CPU frequency adjustment and the stability of resource reconfigurations are both considered. Finally, we present a communication-aware resource placement approach. Simulation results show that, compared with the reactive strategies, our strategy achieves an obviously better latency performance, and effectively avoids unnecessary resource adjustments. Meanwhile, the energy consumption is about saved by 50 percent on average, and the communication cost is maintained at a very low level of 4 percent.

[1]  Zhongzhi Luan,et al.  Using Paralleled-PEs Method to Resolve the Bursting Data in Distributed Stream Processing System , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[2]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[3]  Yin Yang,et al.  DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[4]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Roberto Baldoni,et al.  Adaptive online scheduling in storm , 2013, DEBS.

[6]  Yogesh L. Simmhan,et al.  Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[7]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[8]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[9]  Kurt Rothermel,et al.  Minimizing Communication Overhead in Window-Based Parallel Complex Event Processing , 2017, DEBS.

[10]  Reinaldo Morabito,et al.  OPEN QUEUEING NETWORKS: OPTIMIZATION AND PERFORMANCE EVALUATION MODELS FOR DISCRETE MANUFACTURING SYSTEMS * , 2009 .

[11]  Indranil Gupta,et al.  Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[12]  Frank Dürr,et al.  Solving the Multi-Operator Placement Problem in Large-Scale Operator Networks , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[13]  Yogesh L. Simmhan,et al.  Reactive Resource Provisioning Heuristics for Dynamic Dataflows on Cloud Infrastructure , 2015, IEEE Transactions on Cloud Computing.

[14]  Vincenzo Grassi,et al.  Distributed QoS-aware scheduling in storm , 2015, DEBS.

[15]  Kian-Lee Tan,et al.  ChronoStream: Elastic stateful stream computation in the cloud , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[17]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[18]  Christof Fetzer,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[19]  Dejan S. Milojicic,et al.  Adaptive scheduling of parallel jobs in spark streaming , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[20]  Tiziano De Matteis,et al.  Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing , 2016, PPoPP.

[21]  Kun-Lung Wu,et al.  COLA: Optimizing Stream Processing Applications via Graph Partitioning , 2009, Middleware.

[22]  Li Su,et al.  Enorm: efficient window-based computation in large-scale distributed stream processing systems , 2016, DEBS.

[23]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[24]  Kun Yang,et al.  Mobile Social Networks: Architectures, Social Properties, and Key Research Challenges , 2013, IEEE Communications Surveys & Tutorials.

[25]  Depei Qian,et al.  Operator placement with QoS constraints for distributed stream processing , 2011, 2011 7th International Conference on Network and Service Management.

[26]  Yogesh L. Simmhan,et al.  Exploiting application dynamism and cloud elasticity for continuous dataflows , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Dimitrios Gunopulos,et al.  Elastic complex event processing exploiting prediction , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[28]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[29]  Andrey Brito,et al.  Scalable and elastic realtime click stream analysis using StreamMine3G , 2014, DEBS '14.

[30]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[31]  Yogesh L. Simmhan,et al.  Model-driven Scheduling for Distributed Stream Processing Systems , 2017, J. Parallel Distributed Comput..

[32]  Dongsheng Ma,et al.  Enabling Power-Efficient DVFS Operations on Silicon , 2010, IEEE Circuits and Systems Magazine.

[33]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[34]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[35]  Tiziano De Matteis,et al.  Proactive elasticity and energy awareness in data stream processing , 2017, J. Syst. Softw..

[36]  Stratis Viglas,et al.  Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters , 2014, CIKM.

[37]  Rodrigo Fonseca,et al.  Managing parallelism for stream processing in the cloud , 2012, HotCDP '12.

[38]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[39]  Andrey Brito,et al.  Scalable and Low-Latency Data Processing with Stream MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[40]  Tarek F. Abdelzaher,et al.  An automated profiling subsystem for QoS-aware services , 2000, Proceedings Sixth IEEE Real-Time Technology and Applications Symposium. RTAS 2000.

[41]  Abraham Bernstein,et al.  Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling , 2013, SSWS@ISWC.

[42]  Tiziano De Matteis,et al.  Elastic Scaling for Distributed Latency-Sensitive Data Stream Operators , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[43]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[44]  Kun Yang,et al.  On effective offloading services for resource-constrained mobile devices running heavier mobile Internet applications , 2008, IEEE Communications Magazine.

[45]  Thomas S. Heinze,et al.  Latency-aware elastic scaling for distributed data stream processing systems , 2014, DEBS '14.

[46]  Katharina Morik,et al.  Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights , 2014, ECML/PKDD.

[47]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[48]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[49]  Freddy Lécué,et al.  Elastic Stream Processing for Distributed Environments , 2015, IEEE Internet Computing.

[50]  Keqin Li,et al.  Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments , 2015, Inf. Sci..

[51]  Kezhi Wang,et al.  Joint Energy Minimization and Resource Allocation in C-RAN with Mobile Cloud , 2015, IEEE Transactions on Cloud Computing.

[52]  Yogesh L. Simmhan,et al.  PLAStiCC: Predictive Look-Ahead Scheduling for Continuous Dataflows on Clouds , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.