Investigating Edge vs. Cloud Computing Trade-offs for Stream Processing

The recent spectacular rise of the Internet of Things and the associated augmentation of the data deluge motivated the emergence of Edge computing as a means to distribute processing from centralized Clouds towards decentralized processing units close to the data sources. This led to new challenges in ways to distribute processing across Cloud-based, Edge-based or hybrid Cloud/Edge-based infrastructures. In particular, a major question is: how much can one improve (or degrade) the performance of an application by performing computation closer to the data sources rather than in the Cloud? This paper proposes a methodology to understand such performance trade-offs and illustrates it through experimental evaluation with two real-life stream processing use-cases executed on fully-Cloud and hybrid Cloud-Edge testbeds using state-of-the-art processing engines for each environment. We derive a set of take-aways for the community, highlighting the limitations of each environment, the scenarios that could benefit from hybrid Edge-Cloud deployments, what relevant parameters impact performance and how.

[1]  María S. Pérez-Hernández,et al.  Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[2]  Matthias Sax,et al.  Apache Kafka , 2019, Encyclopedia of Big Data Technologies.

[3]  Prem Prakash Jayaraman,et al.  Data Ingestion and Storage Performance of IoT Platforms: Study of OpenIoT , 2016, InterOSS@IoT.

[4]  Paul Jacob,et al.  An Architecture for Intelligent Data Processing on IoT Edge Devices , 2017, 2017 UKSim-AMSS 19th International Conference on Computer Modelling & Simulation (UKSim).

[5]  Mahadev Satyanarayanan,et al.  Scalable crowd-sourcing of video from mobile devices , 2013, MobiSys '13.

[6]  Sheng Huang,et al.  RTA: Real Time Actionable Events Detection as a Service , 2016, 2016 IEEE International Conference on Web Services (ICWS).

[7]  Ramesh K. Sitaraman,et al.  Optimizing Grouped Aggregation in Geo-Distributed Streaming Analytics , 2015, HPDC.

[8]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[9]  Daniel B. Work,et al.  Using coarse GPS data to quantify city-scale transportation system resilience to extreme events , 2015, ArXiv.

[10]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[11]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[12]  Julie A. McCann,et al.  Adaptive Edge Analytics for Distributed Networked Control of Water Systems , 2016, 2016 IEEE First International Conference on Internet-of-Things Design and Implementation (IoTDI).

[13]  Stacy Patterson,et al.  EdgeBench: Benchmarking Edge Computing Platforms , 2018, 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion).

[14]  Pedro Silva,et al.  Planner: Cost-Efficient Execution Plans Placement for Uniform Stream Analytics on Edge and Cloud , 2018, 2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS).

[15]  Zhuo Liu,et al.  Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[16]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Peter Kilpatrick,et al.  Challenges and Opportunities in Edge Computing , 2016, 2016 IEEE International Conference on Smart Cloud (SmartCloud).

[18]  G. Priyanka Reddy,et al.  Message Queuing Telemetry Transport , 2017 .

[19]  Vladimir Vlassov,et al.  SpanEdge: Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers , 2016, 2016 IEEE/ACM Symposium on Edge Computing (SEC).