Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet of Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new interdiscipline, edge AI or edge intelligence (EI), is beginning to receive a tremendous amount of interest. However, research on EI is still in its infancy stage, and a dedicated venue for exchanging the recent advances of EI is highly desired by both the computer system and AI communities. To this end, we conduct a comprehensive survey of the recent research efforts on EI. Specifically, we first review the background and motivation for AI running at the network edge. We then provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the network edge. Finally, we discuss future research opportunities on EI. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions, and inspire further research ideas on EI.

[1]  Paramvir Bahl,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015, SenSys.

[2]  Jing Li,et al.  IF-CNN: Image-Aware Inference Framework for CNN With the Collaboration of Mobile Devices and Cloud , 2018, IEEE Access.

[3]  Katherine Guo,et al.  Precog: prefetching for image recognition applications at the edge , 2017, SEC.

[4]  Zhenming Liu,et al.  DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[9]  Cecilia Mascolo,et al.  LEO: scheduling sensor inference algorithms across heterogeneous mobile processors and network resources , 2016, MobiCom.

[10]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[11]  Jon Crowcroft,et al.  Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices , 2018, AIES.

[12]  Yong Wang,et al.  Low-Latency Broadband Analog Aggregation for Federated Edge Learning , 2018, ArXiv.

[13]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[14]  Nikhil R. Devanur,et al.  PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.

[15]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[16]  Zhuo Chen,et al.  Bandwidth-Efficient Live Video Analytics for Drones Via Edge Computing , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[17]  Fengyuan Xu,et al.  A Privacy-Preserving Deep Learning Approach for Face Recognition with Edge Computing , 2018, HotEdge.

[18]  Matthieu Cord,et al.  Gossip training for deep learning , 2016, ArXiv.

[19]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yifan Wang,et al.  pCAMP: Performance Comparison of Machine Learning Packages on the Edges , 2019, HotEdge.

[22]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Amar Phanishayee,et al.  Accelerating Deep Learning Workloads Through Efficient Multi-Model Execution , 2018 .

[25]  Mohammad Alian,et al.  A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[27]  Mianxiong Dong,et al.  Deep Learning for Smart Industry: Efficient Manufacture Inspection System With Fog Computing , 2018, IEEE Transactions on Industrial Informatics.

[28]  Martin Jaggi,et al.  Sparsified SGD with Memory , 2018, NeurIPS.

[29]  Ming Zhao,et al.  Are Existing Knowledge Transfer Techniques Effective for Deep Learning with Edge Devices? , 2018, 2018 IEEE International Conference on Edge Computing (EDGE).

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xu Chen,et al.  Exploiting Massive D2D Collaboration for Energy-Efficient Mobile Edge Computing , 2017, IEEE Wireless Communications.

[32]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[33]  Ramesh K. Sitaraman,et al.  Optimizing Grouped Aggregation in Geo-Distributed Streaming Analytics , 2015, HPDC.

[34]  Diana Marculescu,et al.  Designing Adaptive Neural Networks for Energy-Constrained Image Classification , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[35]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Joseph E. Gonzalez,et al.  ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Scale , 2018 .

[37]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[38]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[39]  Zhi Zhou,et al.  Boomerang: On-Demand Cooperative Deep Neural Network Inference for Edge Intelligence on the Industrial Internet of Things , 2019, IEEE Network.

[40]  Samvit Jain,et al.  ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Enterprise Scale , 2018, ArXiv.

[41]  Tara Javidi,et al.  Peer-to-peer Federated Learning on Graphs , 2019, ArXiv.

[42]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Mahadev Satyanarayanan,et al.  Towards wearable cognitive assistance , 2014, MobiSys.

[44]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[45]  Abhinav Vishnu,et al.  GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.

[46]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[47]  Mani B. Srivastava,et al.  Mitigating Multi-tenant Interference in Continuous Mobile Offloading , 2018, CLOUD.

[48]  Weisong Shi,et al.  EdgeOS_H: A Home Operating System for Internet of Everything , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[49]  Hanlin Tang,et al.  Communication Compression for Decentralized Training , 2018, NeurIPS.

[50]  Gregory R. Ganger,et al.  Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing , 2018, USENIX Annual Technical Conference.

[51]  Quan Quan,et al.  A portable, automatic data qantizer for deep neural networks , 2018, PACT.

[52]  Seong-Lyun Kim,et al.  Blockchained On-Device Federated Learning , 2018, IEEE Communications Letters.

[53]  Katherine Guo,et al.  Cachier: Edge-Caching for Recognition Applications , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[54]  Dan Wang,et al.  Data-driven Task Allocation for Multi-task Transfer Learning on the Edge , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[55]  Paramvir Bahl,et al.  Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[56]  Qun Li,et al.  eSGD: Communication Efficient Distributed Deep Learning on the Edge , 2018, HotEdge.

[57]  Shih-Chieh Chang,et al.  A Dynamic Deep Neural Network Design for Efficient Workload Allocation in Edge Computing , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[58]  Song Guo,et al.  Edge Intelligence for the Industrial Internet of Things , 2019, IEEE Netw..

[59]  K. B. Letaief,et al.  A Survey on Mobile Edge Computing: The Communication Perspective , 2017, IEEE Communications Surveys & Tutorials.

[60]  Xu Chen,et al.  Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy , 2018, MECOMM@SIGCOMM.

[61]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Efficient Inference , 2017, ICML.

[62]  Weisong Shi,et al.  EdgeOS_H: A Home Operating System for Internet of Everything , 2017, ICDCS 2017.

[63]  Soo-Mook Moon,et al.  IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers , 2018, SoCC.

[64]  Yonggang Wen,et al.  JALAD: Joint Accuracy-And Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[65]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[66]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[67]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[68]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[69]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[70]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[71]  Zhiwei Xu,et al.  KLRA: A Kernel Level Resource Auditing Tool For IoT Operating System Security , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[72]  Ion Stoica,et al.  Chameleon: scalable adaptation of video analytics , 2018, SIGCOMM.

[73]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[74]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[75]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[76]  Bo Hu,et al.  FoggyCache: Cross-Device Approximate Computation Reuse , 2018, MobiCom.

[77]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[78]  Philip S. Yu,et al.  Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud , 2018, KDD.

[79]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[80]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[81]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[82]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[83]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[84]  Yu Wang,et al.  Towards Real-Time Object Detection on Embedded Systems , 2018, IEEE Transactions on Emerging Topics in Computing.

[85]  Dan Wang,et al.  Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[86]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[87]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[88]  Paramvir Bahl,et al.  VideoEdge: Processing Camera Streams using Hierarchical Clusters , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[89]  Srikumar Venugopal,et al.  Shadow Puppets: Cloud-level Accurate AI Inference at the Speed and Economy of Edge , 2018, HotEdge.

[90]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[91]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[92]  Yiran Chen,et al.  MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[93]  Yehia El-khatib,et al.  Adaptive deep learning model selection on embedded systems , 2018, LCTES.

[94]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[95]  Forrest N. Iandola,et al.  How to scale distributed deep learning? , 2016, ArXiv.

[96]  Saibal Mukhopadhyay,et al.  Edge-Host Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[97]  Xu Chen,et al.  In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning , 2018, IEEE Network.

[98]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[99]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[100]  Daniel Svozil,et al.  Introduction to multi-layer feed-forward neural networks , 1997 .

[101]  Mehdi Bennis,et al.  On-Device Federated Learning via Blockchain and its Latency Analysis , 2018, ArXiv.

[102]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[103]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[104]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[105]  Hamed Haddadi,et al.  A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics , 2017, IEEE Internet of Things Journal.

[106]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[107]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[108]  Steven Bohez,et al.  The cascading neural network: building the Internet of Smart Things , 2017, Knowledge and Information Systems.

[109]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[110]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[111]  Xiaoyan Wang,et al.  Big Data Privacy Preserving in Multi-Access Edge Computing for Heterogeneous Internet of Things , 2018, IEEE Communications Magazine.

[112]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[113]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[114]  Chen Zhang,et al.  FFS-VA: A Fast Filtering System for Large-scale Video Analytics , 2018, ICPP.

[115]  Yiran Chen,et al.  MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[116]  Amar Phanishayee,et al.  PipeDream: Pipeline Parallelism for DNN Training , 2018 .

[117]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[118]  Hyeontaek Lim,et al.  Picking Interesting Frames in Streaming Video , 2018 .