Operational Stream Processing: Towards Scalable and Consistent Event-Driven Applications

In the last decade we are witnessing a widespread adoption of architectural styles such as microservices, for building event-driven software applications and deploying them in cloud infrastructures. Such services favor the separation of a database into independent silos of data, each of which is owned entirely by a single service. As a result, traditional oltp systems no longer fit the architectural picture and developers often turn to ad-hoc solutions that rarely support acid transaction consistency. At the same time, we are witnessing the gradual maturation of distributed streaming dataflow systems. These systems nowadays have departed from the mere analysis of streaming windows and complex-event processing, employing sophisticated methods for managing state, keeping it consistent, and ensuring exactly-once processing guarantees in the presence of failures. The goal of this paper is threefold. First, we illustrate the requirements of stateful software services in terms of consistency and scalability. Second, we present how well existing solutions meet those requirements. Finally, we outline a set of challenging problems and propose research directions for enabling event-driven applications to be developed on top of streaming dataflow systems. We strongly believe that streaming dataflows can have a central place in service-oriented architectures, taking over the execution of acid transactions, ensuring message delivery and processing, in order to perform scalable execution of services.

[1]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[2]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[3]  Grigori Melnik,et al.  Exploring CQRS and Event Sourcing: A journey into high scalability, availability, and maintainability with Windows Azure , 2013 .

[4]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[5]  Martin Kleppmann,et al.  Kafka, Samza and the Unix Philosophy of Distributed Data , 2015, IEEE Data Eng. Bull..

[6]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[7]  Ali Ghodsi,et al.  Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity , 2015, SIGMOD Conference.

[8]  Seif Haridi,et al.  State Management in Apache Flink®: Consistent Stateful Distributed Stream Processing , 2017, Proc. VLDB Endow..

[9]  Claes Wikström,et al.  Concurrent programming in ERLANG (2nd ed.) , 1996 .

[10]  Joe Armstrong,et al.  Concurrent programming in ERLANG , 1993 .

[11]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[12]  Jeyhun Karimov,et al.  Benchmarking Distributed Stream Data Processing Systems , 2019, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[13]  Philip A. Bernstein,et al.  Developing Cloud Services Using the Orleans Virtual Actor Model , 2016, IEEE Internet Computing.

[14]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[15]  Michael Stonebraker,et al.  An Evaluation of Distributed Concurrency Control , 2017, Proc. VLDB Endow..

[16]  Eddie Kohler,et al.  Noria: dynamic, partially-stateful data-flow for high-performance web applications , 2018, OSDI.

[17]  Badrish Chandramouli,et al.  FASTER: A Concurrent Key-Value Store with In-Place Updates , 2018, SIGMOD Conference.

[18]  Michael Stonebraker,et al.  S-Store: Streaming Meets Transaction Processing , 2015, Proc. VLDB Endow..

[19]  Sam Newman,et al.  Building microservices - designing fine-grained systems, 1st Edition , 2015 .

[20]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[21]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[22]  Phil Bernstein,et al.  Transactions for Distributed Actors in the Cloud , 2016 .