Modeling and Simulation of Spark Streaming

As more and more devices connect to Internet of Things, unbounded streams of data will be generated, which have to be processed "on the fly" in order to trigger automated actions and deliver real-time services. Spark Streaming is a popular realtime stream processing framework. To make efficient use of Spark Streaming and achieve stable stream processing, it requires a careful interplay between different parameter configurations. Mistakes may lead to significant resource overprovisioning and bad performance. To alleviate such issues, this paper develops an executable and configurable model named SSP (stands for Spark Streaming Processing) to model and simulate Spark Streaming. SSP is written in ABS, which is a formal, executable, and object-oriented language for modeling distributed systems by means of concurrent object groups. SSP allows users to rapidly evaluate and compare different parameter configurations without deploying their applications on a cluster/cloud. The simulation results show that SSP is able to mimic Spark Streaming in different scenarios.

[1]  Elvira Albert,et al.  SACO: Static Analyzer for Concurrent Objects , 2014, TACAS.

[2]  Frank S. de Boer,et al.  User-defined schedulers for real-time concurrent objects , 2012, Innovations in Systems and Software Engineering.

[3]  Sang Hyuk Son,et al.  Applying Formal Methods to Modeling and Analysis of Real-time Data Streams , 2011, J. Comput. Sci. Eng..

[4]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[5]  Walter Dosch,et al.  Transforming Stream Processing Functions into State Transition Machines , 2004, SERA.

[6]  Scott A. Smolka,et al.  Turing machines, transition systems, and interaction , 2004, Inf. Comput..

[7]  Helmut Krcmar,et al.  Modeling and Simulating Apache Spark Streaming Applications , 2016, Softwaretechnik-Trends.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Einar Broch Johnsen,et al.  ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters , 2016, FASE.

[10]  Scott A. Smolka,et al.  Turing Machines, Transition Systems, and Interaction , 2002, EXPRESS.

[11]  Alexandre Proutière,et al.  Modeling integration of streaming and data traffic , 2004, Perform. Evaluation.

[12]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[13]  Reiner Hähnle,et al.  ABS: A Core Language for Abstract Behavioral Specification , 2010, FMCO.

[14]  Samarjit Chakraborty,et al.  Event count automata: a state-based model for stream processing systems , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[15]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.