Integrating real-time and batch processing in a polystore

This paper describes a stream processing engine called S-Store and its role in the BigDAWG polystore. Fundamentally, S-Store acts as a frontend processor that accepts input from multiple sources, and massages it into a form that has eliminated errors (data cleaning) and translates that input into a form that can be efficiently ingested into BigDAWG. S-Store also acts as an intelligent router that sends input tuples to the appropriate components of BigDAWG. All updates to S-Store's shared memory are done in a transactionally consistent (ACID) way, thereby eliminating new errors caused by non-synchronized reads and writes. The ability to migrate data from component to component of BigDAWG is crucial. We have described a migrator from S-Store to Postgres that we have implemented as a first proof of concept. We report some interesting results using this migrator that impact the evaluation of query plans.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[3]  Michael Stonebraker,et al.  A Demonstration of the BigDAWG Polystore System , 2015, Proc. VLDB Endow..

[4]  Michael Stonebraker,et al.  S-Store: Streaming Meets Transaction Processing , 2015, Proc. VLDB Endow..

[5]  Michael Stonebraker,et al.  SciDB: A Database Management System for Applications with Complex Analytics , 2013, Computing in Science & Engineering.

[6]  Tilmann Rabl,et al.  TPC-DI: The First Industry Benchmark for Data Integration , 2014, Proc. VLDB Endow..

[7]  Jun Rao,et al.  Liquid: Unifying Nearline and Offline Big Data Integration , 2015, CIDR.

[8]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[9]  Panos Vassiliadis,et al.  A Survey of Extract-Transform-Load Technology , 2009, Int. J. Data Warehous. Min..

[10]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[11]  Michael Stonebraker,et al.  Handling Shared, Mutable State in Stream Processing with Correctness Guarantees , 2015, IEEE Data Eng. Bull..

[12]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[13]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[14]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[15]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[16]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[17]  Michael Stonebraker,et al.  S-Store: A Streaming NewSQL System for Big Velocity Applications , 2014, Proc. VLDB Endow..

[18]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[19]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[20]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..