Maximizing Determinism in Stream Processing Under Latency Constraints

The problem of coping with the demands of determinism and meeting latency constraints is challenging in distributed data stream processing systems that have to process high volume data streams that arrive from different unsynchronized input sources. In order to deterministically process the streaming data, they need mechanisms that synchronize the order in which tuples are processed by the operators. On the other hand, achieving real-time response in such a system requires careful tradeoff between determinism and low latency performance. We build on a recently proposed approach to handle data exchange and synchronization in stream processing, namely ScaleGate, which comes with guarantees for determinism and an efficient lock-free implementation, enabling high scalability. Considering the challenge and trade-offs implied by real-time constraints, we propose a system which comprises (a) a novel data structure called Slack-ScaleGate (SSG), along with its algorithmic implementation; SSG enables us to guarantee the deterministic processing of tuples as long as they are able to meet their latency constraints, and (b) a method to dynamically tune the maximum amount of time that a tuple can wait in the SSG data-structure, relaxing the determinism guarantees when needed, in order to satisfy the latency constraints. Our detailed experimental evaluation using a traffic monitoring application deployed in the city of Dublin, illustrates the working and benefits of our approach.

[1]  Marina Papatriantafilou,et al.  Concurrent data structures for efficient streaming aggregation , 2014, SPAA.

[2]  Tao Ye,et al.  A recursive random search algorithm for large-scale network parameter configuration , 2003, SIGMETRICS '03.

[3]  Stanley B. Zdonik,et al.  Revision Processing in a Stream Processing Engine: A High-Level Design , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[5]  Theodore Johnson,et al.  Out-of-order processing: a new architecture for high-performance stream systems , 2008, Proc. VLDB Endow..

[6]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  Christof Fetzer,et al.  Quality-driven processing of sliding window aggregates over out-of-order data streams , 2015, DEBS.

[8]  Christof Fetzer,et al.  Quality-driven disorder handling for concurrent windowed stream queries with shared operators , 2016, DEBS.

[9]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[10]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[11]  Ge Yu,et al.  Aggressive Complex Event Processing with Confidence over Out-of-Order Streams , 2011, Journal of Computer Science and Technology.

[12]  Dimitrios Gunopulos,et al.  Elastic complex event processing exploiting prediction , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[13]  Vana Kalogeraki,et al.  Real-Time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments , 2014, ICAC.

[14]  Michael Philippsen,et al.  Distributed Low-Latency Out-of-Order Event Processing for High Data Rate Sensor Streams , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[15]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[16]  Marina Papatriantafilou,et al.  Deterministic real-time analytics of geospatial data streams through ScaleGate objects , 2015, DEBS.

[17]  Michael Philippsen,et al.  Reliable speculative processing of out-of-order event streams in generic publish/subscribe middlewares , 2013, DEBS '13.

[18]  Marina Papatriantafilou,et al.  Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[19]  Jun Sun,et al.  Quality-driven disorder handling for m-way sliding window stream joins , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[20]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[21]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[22]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .