Staleness Control for Edge Data Analytics

A new generation of cyber-physical systems has emerged with a large number of devices that continuously generate and consume massive amounts of data in a distributed and mobile manner. Accurate and near real-time decisions based on such streaming data are in high demand in many areas of optimization for such systems. Edge data analytics bring processing power in the proximity of data sources, reduce the network delay for data transmission, allow large-scale distributed training, and consequently help meeting real-time requirements. Nevertheless, the multiplicity of data sources leads to multiple distributed machine learning models that may suffer from sub-optimal performance due to the inconsistency in their states. In this work, we tackle the insularity, concept drift, and connectivity issues in edge data analytics to minimize its accuracy handicap without losing its timeliness benefits. Thus, we propose an efficient model synchronization mechanism for distributed and stateful data analytics. Staleness Control for Edge Data Analytics (SCEDA) ensures the high adaptability of synchronization frequency in the face of an unpredictable environment by addressing the trade-off between the generality and timeliness of the model.

[1]  Amit P. Sheth,et al.  On Using the Intelligent Edge for IoT Analytics , 2017, IEEE Intelligent Systems.

[2]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[3]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.

[7]  Suman Banerjee,et al.  A vehicle-based edge computing platform for transit and human mobility analytics , 2017, SEC.

[8]  Naveen T. R. Babu,et al.  Energy, latency and staleness tradeoffs in AI-driven IoT , 2019, SEC.

[9]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[10]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Hamed Haddadi,et al.  Private and Scalable Personal Data Analytics Using Hybrid Edge-to-Cloud Deep Learning , 2018, Computer.

[13]  H. T. Mouftah,et al.  Communication-based Plug-In Hybrid Electrical Vehicle load management in the smart grid , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Ivona Brandic,et al.  Consistency of the Fittest: Towards Dynamic Staleness Control for Edge Data Analytics , 2018, Euro-Par Workshops.

[16]  Carlo Curino,et al.  Towards Geo-Distributed Machine Learning , 2017, IEEE Data Eng. Bull..

[17]  George F. Riley,et al.  The ns-3 Network Simulator , 2010, Modeling and Tools for Network Simulation.

[18]  Tolga Ovatman,et al.  A Decentralized Replica Placement Algorithm for Edge Computing , 2018, IEEE Transactions on Network and Service Management.

[19]  Seunghak Lee,et al.  Solving the Straggler Problem with Bounded Staleness , 2013, HotOS.

[20]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[21]  Aakanksha Chowdhery,et al.  Urban IoT Edge Analytics , 2018 .

[22]  Ashwin Ashok,et al.  Vehicular Cloud Computing through Dynamic Computation Offloading , 2017, Comput. Commun..

[23]  Rajkumar Buyya,et al.  Distributed data stream processing and edge computing: A survey on resource elasticity and future directions , 2017, J. Netw. Comput. Appl..

[24]  Klaus Wehrle,et al.  Modeling and Tools for Network Simulation , 2010, Modeling and Tools for Network Simulation.

[25]  Sanjit Krishnan Kaul,et al.  Minimizing age of information in vehicular networks , 2011, 2011 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[26]  Nicholas D. Lane,et al.  Squeezing Deep Learning into Mobile and Embedded Devices , 2017, IEEE Pervasive Computing.

[27]  Hussein T. Mouftah,et al.  Chapter 25 – Smart Grid Communications: Opportunities and Challenges , 2013 .

[28]  Gwendal Simon,et al.  360-Degree Video Head Movement Dataset , 2017, MMSys.

[29]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[30]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[31]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[32]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[33]  Axel Jantsch,et al.  Fog Computing in the Internet of Things , 2018 .

[34]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[35]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[36]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[37]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[38]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[39]  Anthony Ephremides,et al.  The Cost of Delay in Status Updates and Their Value: Non-Linear Ageing , 2018, IEEE Transactions on Communications.

[40]  Lihong Li,et al.  Sample Complexity Bounds of Exploration , 2012, Reinforcement Learning.

[41]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[42]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[43]  Teerawat Issariyakul,et al.  Introduction to Network Simulator NS2 , 2008 .

[44]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[45]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[46]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[47]  Ramesh K. Sitaraman,et al.  Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics , 2016, SoCC.

[48]  Cecilia Mascolo,et al.  A hybrid approach for content-based publish/subscribe in vehicular networks , 2009, Pervasive Mob. Comput..

[49]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[50]  Paramvir Bahl,et al.  Vision: the case for cellular small cells for cloudlets , 2014, MCS '14.

[51]  Gregg Podnar,et al.  Active management of a heterogeneous energy store for electric vehicles , 2011, 2011 IEEE Forum on Integrated and Sustainable Transportation Systems.

[52]  Valeria Cardellini,et al.  Decentralized self-adaptation for elastic Data Stream Processing , 2018, Future Gener. Comput. Syst..

[53]  Xiao Ma,et al.  Age-of- Information for Computation- Intensive Messages in Mobile Edge Computing , 2019, 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP).

[54]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[55]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[56]  BuyyaRajkumar,et al.  Distributed data stream processing and edge computing , 2018 .

[57]  Zhisheng Niu,et al.  Decentralized Status Update for Age-of-Information Optimization in Wireless Multiaccess Channels , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[58]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[59]  Paramvir Bahl,et al.  VideoEdge: Processing Camera Streams using Hierarchical Clusters , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[60]  BifetAlbert,et al.  MOA: Massive Online Analysis , 2010 .

[61]  Bhaskar Krishnamachari,et al.  Optimizing Content Dissemination in Vehicular Networks with Radio Heterogeneity , 2014, IEEE Transactions on Mobile Computing.

[62]  Hyesoon Kim,et al.  BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[63]  Mor Naaman,et al.  A Data-Driven Study of View Duration on YouTube , 2016, ICWSM.

[64]  AkellaAditya,et al.  Low Latency Geo-distributed Data Analytics , 2015 .

[65]  Wei Wang,et al.  Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning , 2018, SoCC.

[66]  João Gama,et al.  Adaptive Model Rules From High-Speed Data Streams , 2014, BigMine.

[67]  Anthony Ephremides,et al.  On the Age of Information With Packet Deadlines , 2018, IEEE Transactions on Information Theory.