Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

The widespread growth of Big Data and the evolution of Internet of Things (IoT) technologies enable cities to obtain valuable intelligence from a large amount of real-time produced data. In a Smart City, various IoT devices generate streams of data continuously which need to be analyzed within a short period of time; using some Big Data technique. Distributed stream processing frameworks (DSPFs) have the capacity to handle real-time data processing for Smart Cities. In this paper, we examine the applicability of employing distributed stream processing frameworks at the data processing layer of Smart City and appraising the current state of their adoption and maturity among the IoT applications. Our experiments focus on evaluating the performance of three DSPFs, namely Apache Storm, Apache Spark Streaming, and Apache Flink. According to our obtained results, choosing a proper framework at the data analytics layer of a Smart City requires enough knowledge about the characteristics of target applications. Finally, we conclude each of the frameworks studied here have their advantages and disadvantages. Our experiments show Storm and Flink have very similar performance, and Spark Streaming, has much higher latency, while it provides higher throughput.

[1]  Changjun Jiang,et al.  Moving Hadoop into the Cloud with Flexible Slot Management and Speculative Execution , 2017, IEEE Transactions on Parallel and Distributed Systems.

[2]  Yogesh L. Simmhan,et al.  RIoTBench: An IoT benchmark for distributed stream processing systems , 2017, Concurr. Comput. Pract. Exp..

[3]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[4]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[5]  Sabeur Aridhi,et al.  An experimental survey on big data frameworks , 2016, Future Gener. Comput. Syst..

[6]  Juan Touriño,et al.  Performance evaluation of big data frameworks for large-scale data analytics , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[7]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[8]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[9]  Asterios Katsifodimos,et al.  Apache Flink: Stream Analytics at Scale , 2016, 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW).

[10]  Maziar Goudarzi,et al.  Heterogeneous Architectures for Big Data Batch Processing in MapReduce Paradigm , 2019, IEEE Transactions on Big Data.

[11]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[12]  Arun Kejariwal,et al.  Real Time Analytics: Algorithms and Systems , 2015, Proc. VLDB Endow..

[13]  Maziar Goudarzi,et al.  Gapprox: using Gallup approach for approximation in Big Data processing , 2019, Journal of Big Data.

[14]  Marcin Gorawski,et al.  A Survey of Data Stream Processing Tools , 2014, ISCIS.

[15]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[16]  Sasu Tarkoma,et al.  A survey of systems for massive stream analytics , 2016, 1605.09021.

[17]  Reynold Xin,et al.  Apache Spark , 2016 .

[18]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[19]  Manuel Díaz,et al.  State-of-the-art, challenges, and open issues in the integration of Internet of things and cloud computing , 2016, J. Netw. Comput. Appl..

[20]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[21]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Guenter Hesse,et al.  Conceptual Survey on Data Stream Processing Systems , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[24]  Victor C. M. Leung,et al.  Toward Big Data in Green City , 2017, IEEE Communications Magazine.

[25]  Daniel Pakkala,et al.  Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems , 2015, Big Data Res..

[26]  Geoffrey Fox,et al.  Survey of Distributed Stream Processing , 2016 .

[27]  Maziar Goudarzi,et al.  A Survey of Distributed Stream Processing Systems for Smart City Data Analytics , 2018, SCIOT '18.

[28]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[29]  Athanasios V. Vasilakos,et al.  Data Mining for the Internet of Things: Literature Review and Challenges , 2015, Int. J. Distributed Sens. Networks.