Self-adaptive processing graph with operator fission for elastic stream processing

Self adaptive mechanism for scaling stream processing systems.Automatic scaling by increasing/decreasing the number of processing operators.Model that changes graph topology based on a reactive and predictive algorithms.Results show that both algorithms enable online self-adaptation of the graph. Nowadays, information generated by the Internet interactions is growing exponentially, creating massive and continuous flows of events from the most diverse sources. These interactions contain valuable information for domains such as government, commerce, and banks, among others. Extracting information in near real-time from such data requires powerful processing tools to cope with the high-velocity and the high-volume stream of events. Specially designed distributed processing engines build a graph-based topology of a static number of processing operators creating bottlenecks and load balance problems when processing dynamic flows of events. In this work we propose a self-adaptive processing graph that provides elasticity and scalability by automatically increasing or decreasing the number of processing operators to improve performance and resource utilization of the system. Our solution uses a model that monitors, analyzes and changes the graph topology with a control algorithm that is both reactive and proactive to the flow of events. We have evaluated our solution with three stream processing applications and results show that our model can adapt the graph topology when receiving events at high rate with sudden peaks, producing very low costs of memory and CPU usage.

[1]  Matthias Weidlich,et al.  Scalable stateful stream processing for smart grids , 2014, DEBS '14.

[2]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Bugra Gedik Partitioning functions for stateful data parallelism in stream processing , 2013, The VLDB Journal.

[4]  Thomas S. Heinze,et al.  An adaptive replication scheme for elastic data stream processing systems , 2015, DEBS.

[5]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[6]  Akebo Yamakami,et al.  On the Validity of a New SMS Spam Collection , 2012, 2012 11th International Conference on Machine Learning and Applications.

[7]  Tore Risch,et al.  Scalable Splitting of Massive Data Streams , 2010, DASFAA.

[8]  Gregory Mone Beyond Hadoop , 2013, CACM.

[9]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[10]  Caroline Tagg,et al.  A corpus linguistics study of SMS text messaging , 2009 .

[11]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[12]  Klaus Meyer-Wegener,et al.  Operator fission for load balancing in distributed heterogeneous data stream processing systems , 2015, DEBS.

[13]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[14]  Thomas S. Heinze,et al.  Latency-aware elastic scaling for distributed data stream processing systems , 2014, DEBS '14.

[15]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[16]  David D. Oberhelman Coming to terms with Web 2.0 , 2007 .

[17]  Chung-Horng Lung,et al.  Cloud Resource Auto-scaling System Based on Hidden Markov Model (HMM) , 2014, 2014 IEEE International Conference on Semantic Computing.

[18]  Karl Aberer,et al.  Toward Massive Query Optimization in Large-Scale Distributed Stream Systems , 2008, Middleware.

[19]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[21]  Changshui Zhang,et al.  Short-term traffic flow forecasting based on Markov chain model , 2003, IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683).

[22]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[23]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[24]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[25]  Yongluan Zhou,et al.  Dynamic Resource Management In a Massively Parallel Stream Processing Engine , 2015, CIKM.

[26]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[27]  Thomas S. Heinze,et al.  Elastic Complex Event Processing under Varying Query Load , 2013, BD3@VLDB.

[28]  Christof Fetzer,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[29]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[30]  Robert J. Meijer,et al.  Dynamically Scaling Apache Storm for the Analysis of Streaming Data , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[31]  Yongluan Zhou,et al.  Integrating fault-tolerance and elasticity in a distributed data stream processing system , 2014, SSDBM '14.

[32]  Bruno Sericola,et al.  Efficient key grouping for near-optimal load balancing in stream processing systems , 2015, DEBS.

[33]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[34]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[35]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[36]  Thomas S. Heinze,et al.  Online parameter optimization for elastic data stream processing , 2015, SoCC.

[37]  Yan Liu,et al.  Optimization of Load Adaptive Distributed Stream Processing Services , 2014, 2014 IEEE International Conference on Services Computing.

[38]  Alejandro P. Buchmann,et al.  Eventlets: Components for the integration of event streams with SOA , 2012, 2012 Fifth IEEE International Conference on Service-Oriented Computing and Applications (SOCA).