Scalable and Reliable Data Stream Processing

Data-stream management systems have for long been considered as a promising architecture for fast data management. The stream processing paradigm poses an attractive means of declaring persistent a ...

[1]  Badrish Chandramouli,et al.  On-the-fly Progress Detection in Iterative Stream Queries , 2009, Proc. VLDB Endow..

[2]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[3]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Nen-Fu Huang,et al.  Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster , 2011, IEEE Transactions on Parallel and Distributed Systems.

[6]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[7]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[8]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[9]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[10]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[11]  Pramod Bhatotia,et al.  Slider: incremental sliding window analytics , 2014, Middleware.

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Rob H. Bisseling,et al.  A simple and efficient parallel FFT algorithm using the BSP model , 2001, Parallel Comput..

[15]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[16]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[17]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[18]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[19]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[20]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[21]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[22]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[23]  Seif Haridi,et al.  Stream Window Aggregation Semantics and Optimization , 2019, Encyclopedia of Big Data Technologies.

[24]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[25]  Rachid Guerraoui,et al.  Introduction to reliable distributed programming , 2006 .

[26]  Mun Choon Chan,et al.  Meteor Shower: A Reliable Stream Processing System for Commodity Data Centers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[27]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[28]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[29]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[30]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[31]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[32]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[33]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[34]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[35]  Frank McSherry,et al.  Faucet: a user-level, modular technique for flow control in dataflow engines , 2016, BeyondMR@SIGMOD.

[36]  Kun-Lung Wu,et al.  Consistent Regions: Guaranteed Tuple Processing in IBM Streams , 2016, Proc. VLDB Endow..

[37]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[38]  Theodore Johnson,et al.  Out-of-order processing: a new architecture for high-performance stream systems , 2008, Proc. VLDB Endow..

[39]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[40]  Peter Bailis,et al.  Coordination Avoidance in Distributed Databases , 2015 .

[41]  Seif Haridi,et al.  Lightweight Asynchronous Snapshots for Distributed Dataflows , 2015, ArXiv.

[42]  Raul Castro Fernandez,et al.  Making State Explicit for Imperative Big Data Processing , 2014, USENIX Annual Technical Conference.

[43]  Kun-Lung Wu,et al.  General Incremental Sliding-Window Aggregation , 2015, Proc. VLDB Endow..

[44]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[45]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[46]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[47]  Ten-Hwang Lai,et al.  On Distributed Snapshots , 1987, Inf. Process. Lett..

[48]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[49]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[50]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[51]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[52]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[53]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[54]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[55]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[56]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[57]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[58]  Reynold Xin,et al.  Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark , 2018, SIGMOD Conference.

[59]  Seif Haridi,et al.  Cutty: Aggregate Sharing for User-Defined Windows , 2016, CIKM.

[60]  Thomas S. Heinze,et al.  The DEBS 2012 grand challenge , 2012, DEBS.

[61]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[62]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[63]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[64]  David Maier,et al.  AdaptWID: An Adaptive, Memory-Efficient Window Aggregation Implementation , 2008, IEEE Internet Computing.

[65]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[66]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[67]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[68]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[69]  Bingsheng He,et al.  Comet: batched stream processing for data intensive distributed computing , 2010, SoCC '10.

[70]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[71]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, IPDPS Workshops.

[72]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[73]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[74]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[75]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[76]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[77]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[78]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[79]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[80]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[81]  Bugra Gedik,et al.  Generic windowing support for extensible stream processing systems , 2014, Softw. Pract. Exp..

[82]  Ion Stoica,et al.  CellIQ : Real-Time Cellular Network Analytics at Scale , 2015, NSDI.

[83]  David Maier,et al.  Semantics of Data Streams and Operators , 2005, ICDT.

[84]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[85]  Martín Abadi,et al.  Incremental, iterative data processing with timely dataflow , 2016, Commun. ACM.

[86]  Seif Haridi,et al.  Large-Scale Data Stream Processing Systems , 2017, Handbook of Big Data Technologies.

[87]  Seif Haridi,et al.  State Management in Apache Flink®: Consistent Stateful Distributed Stream Processing , 2017, Proc. VLDB Endow..

[88]  Michel Raynal,et al.  Detection of stable properties in distributed applications , 1987, PODC '87.

[89]  Volker Markl,et al.  Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[90]  Magdalena Balazinska,et al.  Fault Tolerance and High Availability in Data Stream Management Systems , 2018, Encyclopedia of Database Systems.

[91]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..