Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

Event stream processing (ESP) applications target the real-time processing of huge amounts of data. Events traverse a graph of stream processing operators where the information of interest is extracted. As these applications gain popularity, the requirements for scalability, availability, and dependability increase. In terms of dependability and availability, many applications require a precise recovery, i.e., a guarantee that the outputs during and after a recovery would be the same as if the failure that triggered recovery had never occurred. Existing solutions for precise recovery induce prohibitive latency costs, either by requiring continuous checkpoint or logging (in a passive replication approach) or perfect synchronization between replicas executing the same operations (in an active replication approach). We introduce a novel technique to guarantee precise recovery for ESP applications while minimizing the latency costs as compared to traditional approaches. The technique minimizes latencies via speculative execution in a distributed system. In terms of scalability, the key component of our approach is a modified software transactional memory that provides not only the speculation capabilities but also optimistic parallelization for costly operations.

[1]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[2]  Andrey Brito,et al.  Speculative out-of-order event processing with software transaction memory , 2008, DEBS.

[3]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[4]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[7]  Gustavo Alonso,et al.  Using Optimistic Atomic Broadcast in Transaction Processing Systems , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Jan Vitek,et al.  Streamflex: high-throughput stream programming in java , 2007, OOPSLA.

[9]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[10]  Stefano Ferretti,et al.  A synchronization protocol for supporting peer-to-peer multiplayer online games in overlay networks , 2008, DEBS.

[11]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[12]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[13]  Tore Risch,et al.  Customizable Parallel Execution of Scientific Stream Queries , 2005, VLDB.

[14]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[15]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.

[17]  Torvald Riegel,et al.  Transactifying Applications Using an Open Compiler Framework , 2007 .

[18]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .