A comprehensive study on fault tolerance in stream processing systems

[1]  Thomas S. Heinze,et al.  Cloud-based data stream processing , 2014, DEBS '14.

[2]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[3]  Fault-Tolerance and High Availability in Data Stream Management Systems , 2009, Encyclopedia of Database Systems.

[4]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[5]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[6]  Albert G. Greenberg,et al.  Fault-tolerant stream processing using a distributed, replicated file system , 2008, Proc. VLDB Endow..

[7]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[8]  Dhiraj K. Pradhan,et al.  Roll-Forward and Rollback Recovery: Performance-Reliability Trade-Off , 1997, IEEE Trans. Computers.

[9]  Pierre Sens,et al.  Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues , 2014 .

[10]  W. Kent Fuchs,et al.  Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[11]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[12]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[13]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[14]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[15]  Ali Ghodsi,et al.  Drizzle: Fast and Adaptable Stream Processing at Scale , 2017, SOSP.

[16]  Indranil Gupta,et al.  Stateful Scalable Stream Processing at LinkedIn , 2017, Proc. VLDB Endow..

[17]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[18]  Kurt Rothermel,et al.  Rollback-recovery without checkpoints in distributed event processing systems , 2013, DEBS '13.

[19]  Yogesh L. Simmhan,et al.  RIoTBench: An IoT benchmark for distributed stream processing systems , 2017, Concurr. Comput. Pract. Exp..

[20]  Thomas S. Heinze,et al.  An adaptive replication scheme for elastic data stream processing systems , 2015, DEBS.

[21]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[22]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Patrick P. C. Lee,et al.  Toward High-Performance Distributed Stream Processing via Approximate Fault Tolerance , 2016, Proc. VLDB Endow..

[24]  Jeyhun Karimov,et al.  Analyzing Efficient Stream Processing on Modern Hardware , 2019, Proc. VLDB Endow..

[25]  GhemawatSanjay,et al.  The Google file system , 2003 .

[26]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[27]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[28]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[29]  Li Su,et al.  Passive and Partially Active Fault Tolerance for Massively Parallel Stream Processing Engines , 2019, IEEE Trans. Knowl. Data Eng..

[30]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.