Elastic stream processing in the Cloud

Stream processing is a computing paradigm that has emerged from the necessity of handling high volumes of data in real time. In contrast to traditional databases, stream‐processing systems perform continuous queries and handle data on‐the‐fly. Today, a wide range of application areas relies on efficient pattern detection and queries over streams. The advent of Cloud computing fosters the development of elastic stream‐processing platforms, which are able to dynamically adapt based on different cost–benefit trade‐offs. This article provides an overview of the historical evolution and the key concepts of stream processing, with special focus on adaptivity and Cloud‐based elasticity.

[1]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.

[2]  Y. Simmhan,et al.  Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Peter R. Pietzuch,et al.  Balancing load in stream processing with the cloud , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[4]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[5]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[6]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[8]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[9]  Olga Papaemmanouil,et al.  Supporting Generic Cost Models for Wide-Area Stream Processing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[11]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[12]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Schahram Dustdar,et al.  Deriving a unified fault taxonomy for event-based systems , 2012, DEBS.

[14]  Schahram Dustdar,et al.  Distributed continuous queries over Web service event streams , 2011, 2011 7th International Conference on Next Generation Web Services Practices.

[15]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[16]  Opher Etzion,et al.  Event-processing network model and implementation , 2008, IBM Syst. J..

[17]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[18]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[19]  Odej Kao,et al.  Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud , 2011, IEEE Transactions on Parallel and Distributed Systems.

[20]  Karsten Schwan,et al.  Distributed Stream Management using Utility-Driven Self-Adaptive Middleware , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[21]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[22]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[23]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[24]  Schahram Dustdar,et al.  Winds of Change: From Vendor Lock-In to the Meta Cloud , 2013, IEEE Internet Computing.

[25]  Qiming Chen,et al.  Experience in Continuous analytics as a Service (CaaaS) , 2011, EDBT/ICDT '11.

[26]  Paul N. Martinaitis,et al.  Component-based stream processing "in the cloud" , 2009, CBHPC '09.

[27]  Tim Kraska,et al.  Stormy: an elastic and highly available streaming service in the cloud , 2012, EDBT-ICDT '12.

[28]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[29]  David Hilley,et al.  Cloud Computing: A Taxonomy of Platform and Infrastructure-level Offerings , 2009 .

[30]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[31]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[32]  Elke A. Rundensteiner,et al.  Sequence Pattern Query Processing over Out-of-Order Event Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[34]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[35]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[36]  Kostas Magoutis,et al.  CEC: Continuous eventual checkpointing for data stream processing operators , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[37]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[38]  Yike Guo,et al.  Programming Directives for Elastic Computing , 2012, IEEE Internet Computing.

[39]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[40]  Schahram Dustdar,et al.  Dynamic Migration of Processing Elements for Optimized Query Execution in Event-Based Systems , 2011, OTM Conferences.

[41]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[42]  Yike Guo,et al.  Principles of Elastic Processes , 2011, IEEE Internet Computing.

[43]  Yushun Fan,et al.  Complex event processing in enterprise information systems based on RFID , 2007, Enterp. Inf. Syst..

[44]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[45]  M. Rajeswari,et al.  Cost-Based Optimization of Service Compositions , 2015 .

[46]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[47]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[48]  Chandrakant D. Patel,et al.  Everything as a Service: Powering the New Information Economy , 2011, Computer.

[49]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[50]  Albert G. Greenberg,et al.  Fault-tolerant stream processing using a distributed, replicated file system , 2008, Proc. VLDB Endow..

[51]  Rajkumar Buyya,et al.  Cloud Computing Principles and Paradigms , 2011 .

[52]  Joseph M. Hellerstein,et al.  Online Dynamic Reordering for Interactive Data Processing , 1999, VLDB.

[53]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[54]  Benjamin Satzger,et al.  Self healing distributed systems , 2008 .

[55]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.

[56]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[57]  Ken Yocum,et al.  Wide-Scale Data Stream Management , 2008, USENIX Annual Technical Conference.

[58]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[59]  Ken Yocum,et al.  Ad-hoc data processing in the cloud , 2008, Proc. VLDB Endow..

[60]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[61]  Tim Kraska,et al.  Extending XQuery with Window Functions , 2007, VLDB.

[62]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[63]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[64]  Deepak S. Turaga,et al.  Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[65]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[66]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[67]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[68]  Toyotaro Suzumura,et al.  Elastic Stream Computing with Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[69]  Ying Li,et al.  Placement Strategies for Internet-Scale Data Stream Systems , 2008, IEEE Internet Computing.

[70]  Suman Nath,et al.  RACE: real-time applications over cloud-edge , 2012, SIGMOD Conference.

[71]  Schahram Dustdar,et al.  Testing of data‐centric and event‐based dynamic service compositions , 2013, Softw. Test. Verification Reliab..

[72]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[73]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[74]  Ying Xing,et al.  Providing resiliency to load variations in distributed stream processing , 2006, VLDB.

[75]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.