Adaptive Fault-Tolerance for Dynamic Resource Provisioning in Distributed Stream Processing Systems

A growing number of applications require continuous pro- cessing of high-throughput data streams, e.g., financial anal- ysis, network traffic monitoring, or Big Data analytics for smart cities. Stream processing applications typically re- quire specific quality-of-service levels to achieve their goals; yet, due to the high time-variability of stream characteris- tics, it is often inefficient to statically allocate the resources needed to guarantee application Service Level Agreements (SLAs). In this paper, we present LAAR, a novel method for adaptive replication that trades fault tolerance for in- creased capacity during load spikes. We have implemented and validated LAAR as a middleware layer on top of IBM In- foSphere Streams r . We have performed a wide set of exper- iments on an industrial-quality 60-core cluster deployment and we show that, under the assumption of only statistical knowledge of streams load distribution, LAAR can reduce resource consumption while guaranteeing an upper-bound on information loss in case of failures.

[1]  Andrey Brito,et al.  Scalable and Low-Latency Data Processing with Stream MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[2]  Andrey Brito,et al.  Multithreading-Enabled Active Replication for Event Stream Processing Operators , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[3]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Paolo Bellavista,et al.  Design and Implementation of a Scalable and QoS-aware Stream Processing Framework: The Quasit Prototype , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[5]  Vana Kalogeraki,et al.  RADAR: Adaptive Rate Allocation in Distributed Stream Processing Systems under Bursty Workloads , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[6]  Kun-Lung Wu,et al.  Language level checkpointing support for stream processing applications , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[7]  Karsten Schwan,et al.  Utility-Driven Proactive Management of Availability in Enterprise-Scale Information Flows , 2006, Middleware.

[8]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[9]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[10]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[11]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[12]  Ying Xing,et al.  Providing resiliency to load variations in distributed stream processing , 2006, VLDB.

[13]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[14]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[15]  Kun-Lung Wu,et al.  COLA: Optimizing Stream Processing Applications via Graph Partitioning , 2009, Middleware.

[16]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[17]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[18]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[19]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[20]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[21]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[22]  Kun-Lung Wu,et al.  Fault injection-based assessment of partial fault tolerance in stream processing applications , 2011, DEBS '11.

[23]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[24]  Karsten Schwan,et al.  Distributed Stream Management using Utility-Driven Self-Adaptive Middleware , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[25]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[26]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[27]  Bugra Gedik,et al.  A model‐based framework for building extensible, high performance stream processing middleware and programming language for IBM InfoSphere Streams , 2012, Softw. Pract. Exp..

[28]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[29]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[30]  Kun-Lung Wu,et al.  A code generation approach to optimizing high-performance distributed data stream processing , 2009, CIKM.

[31]  Fan Ye,et al.  A Hybrid Approach to High Availability in Stream Processing Systems , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[32]  Jennifer Widom,et al.  Query Processing, Resource Management, and Approximation ina Data Stream Management System , 2002 .

[33]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[35]  Andrey Brito,et al.  Active Replication at (Almost) No Cost , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.