Consistent Stream Processing: Doctoral Symposium

Stream Processors (SPs) continuously transform huge volumes of input streams with a computational model that is inherently distributed, scalable, and fault-tolerant. For these reasons they are used in application environments in which almost real-time computation is of paramount importance, such as stock option analysis, fraud detection systems, monitoring, and real-time data analytics for web applications. In many applicative domains, SPs are used in conjunction with data management systems such as transactional databases and data warehouses that store intermediate or final results produced by the SPs. However, SPs have no control on the consistency guarantees of the results produced on external components. We propose a novel approach that we name consistent stream processing that integrates the external state of databases within the SP and enforces consistency guarantees both on state updates and on external querying. We extend the computational model of SPs with transactions and we provide two possible strategies to enforce their transactional properties.

[1]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[2]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[3]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[4]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[5]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[6]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[7]  Michael Stonebraker,et al.  S-Store: A Streaming NewSQL System for Big Velocity Applications , 2014, Proc. VLDB Endow..

[8]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[9]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[10]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[11]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[12]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[13]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[14]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[15]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[16]  Kyle Banker,et al.  MongoDB in Action , 2011 .

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.