S-Store: Streaming Meets Transaction Processing

Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system, and provides both ACID and stream-oriented guarantees. We chose to build S-Store as an extension of H-Store - an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already provides, and we can concentrate on the additional features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Esper and Apache Storm, and show how S-Store can sometimes exceed their performance while at the same time providing stronger correctness guarantees.

[1]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  Theodore Johnson,et al.  Query-Aware Partitioning for Monitoring Massive Network Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[5]  Nesime Tatbul,et al.  Transactional stream processing , 2012, EDBT '12.

[6]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[7]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[8]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[9]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[10]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[11]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[12]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[13]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[14]  Theodore Johnson,et al.  Consistency in a Stream Warehouse , 2011, CIDR.

[15]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[16]  Lukasz Golab,et al.  On Concurrency Control in Sliding Window Queries over Data Streams , 2006, EDBT.

[17]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[18]  Elke A. Rundensteiner,et al.  Active Complex Event Processing over Event Streams , 2011, Proc. VLDB Endow..

[19]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[20]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[21]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[22]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[23]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[24]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[25]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[26]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[27]  Raul Castro Fernandez,et al.  Making State Explicit for Imperative Big Data Processing , 2014, USENIX Annual Technical Conference.

[28]  Michael Stonebraker,et al.  S-Store: A Streaming NewSQL System for Big Velocity Applications , 2014, Proc. VLDB Endow..

[29]  Jennifer Widom,et al.  Towards a streaming SQL standard , 2008, Proc. VLDB Endow..

[30]  Dennis Shasha,et al.  The Virtues and Challenges of Ad Hoc + Streams Querying in Finance , 2003, IEEE Data Eng. Bull..

[31]  Michael Stonebraker,et al.  A Demonstration of the BigDAWG Polystore System , 2015, Proc. VLDB Endow..

[32]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[33]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[34]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[35]  Eric A. Brewer,et al.  Highly available, fault-tolerant, parallel dataflows , 2004, SIGMOD '04.

[36]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[37]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..