Fast and Reliable Stream Processing over Wide Area Networks

We present a replication-based approach that enables both fast and reliable stream processing over wide area networks. Our approach replicates stream processing operators in a manner where operator replicas compete with each other to make the earliest impact. Therefore, any processing downstream from such replicas can proceed by relying on the fastest replica without being held back by slow or failed ones. Furthermore, our approach allows replicas to produce output in different orders so as to avoid the cost of forcing an identical execution across replicas, without sacrificing correctness. We first consider semantic issues for correct replicated stream processing and, based on a formal foundation, extend common stream-processing primitives. Next, we discuss strategies for deploying replicas. Finally, we present preliminary remits obtained from experiments on Planet-Lab that substantiate the potential benefits of our approach.

[1]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[2]  Vern Paxson,et al.  End-to-end routing behavior in the Internet , 1996, TNET.

[3]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[4]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[7]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[8]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[9]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[10]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[13]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[14]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[17]  Matt Welsh,et al.  Towards a Dependable Architecture for Internet-scale Sensing , 2006, HotDep.

[18]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[19]  Ying Xing,et al.  Providing resiliency to load variations in distributed stream processing , 2006, VLDB.

[20]  Ying Xing,et al.  A Cooperative, Self-Configuring High-Availability Solution for Stream Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Charalambos A. Charalambides,et al.  Enumerative combinatorics , 2018, SIGA.