Distributed Deep Convolutional Neural Networks for the Internet-of-Things

Due to the high demand in computation and memory, deep learning solutions are mostly restricted to high-performance computing units, e.g., those present in servers, Cloud, and computing centers. In pervasive systems, e.g., those involving Internet-of-Things (IoT) technological solutions, this would require the transmission of acquired data from IoT sensors to the computing platform and wait for its output. This solution might become infeasible when remote connectivity is either unavailable or limited in bandwidth. Moreover, it introduces uncertainty in the "data production to decision making"-latency, which, in turn, might impair control loop stability if the response should be used to drive IoT actuators. In order to support a real-time recall phase directly at the IoT level, deep learning solutions must be completely rethought having in mind the constraints on memory and computation characterizing IoT units. In this paper we focus on Convolutional Neural Networks (CNNs), a specific deep learning solution for image and video classification, and introduce a methodology aiming at distributing their computation onto the units of the IoT system. We formalize such a methodology as an optimization problem where the latency between the data-gathering phase and the subsequent decision-making one is minimized. The methodology supports multiple IoT sources of data as well as multiple CNNs in execution on the same IoT system, making it a general-purpose distributed computing platform for CNN-based applications demanding autonomy, low decision-latency, and high Quality-of-Service.

[1]  Gu-Yeon Wei,et al.  Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.

[2]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[5]  Cesare Alippi,et al.  Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case , 2018, 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[6]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[7]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Bhaskar Krishnamachari,et al.  Fast and Accurate Streaming CNN Inference via Communication Compression on the Edge , 2020, 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI).

[10]  Cesare Alippi,et al.  The (Not) Far-Away Path to Smart Cyber-Physical Systems: An Information-Centric Framework , 2017, Computer.

[11]  Mohsen Guizani,et al.  Deep Learning for IoT Big Data and Streaming Analytics: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[12]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Efficient Inference , 2017, ICML.

[14]  Luca Benini,et al.  Origami: A 803-GOp/s/W Convolutional Network Accelerator , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Kartikeya Bhardwaj,et al.  Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT , 2019, ACM Trans. Embed. Comput. Syst..

[16]  Deborah Estrin,et al.  The impact of data aggregation in wireless sensor networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[17]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[18]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[19]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[20]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[21]  M. Roveri,et al.  Incremental On-Device Tiny Machine Learning , 2020, AIChallengeIoT@SenSys.

[22]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[23]  Philip S. Yu,et al.  Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing , 2019, IEEE Transactions on Industrial Informatics.

[24]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25]  Bhaskar Krishnamachari,et al.  Throughput Optimized Scheduler for Dispersed Computing Systems , 2019, 2019 7th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud).

[26]  Rongrong Ji,et al.  Holistic CNN Compression via Low-Rank Decomposition with Knowledge Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ellen W. Zegura,et al.  Serendipity: enabling remote computing among intermittently connected mobile devices , 2012, MobiHoc '12.

[29]  Song Guo,et al.  Just-in-Time Code Offloading for Wearable Computing , 2015, IEEE Transactions on Emerging Topics in Computing.

[30]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Qun Li,et al.  eSGD: Communication Efficient Distributed Deep Learning on the Edge , 2018, HotEdge.

[33]  Manuel Roveri,et al.  Reducing the Computation Load of Convolutional Neural Networks through Gate Classification , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[34]  Rupak Kharel,et al.  A survey on the challenges and opportunities of the Internet of Things (IoT) , 2017, 2017 Eleventh International Conference on Sensing Technology (ICST).

[35]  Bruno Sericola,et al.  Distributed deep learning on edge-devices: Feasibility via adaptive compression , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[36]  Björn Schuller,et al.  Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments , 2017 .

[37]  Yang Xiao,et al.  IEEE 802.11n: enhancements for higher throughput in wireless LANs , 2005, IEEE Wirel. Commun..

[38]  Jaume Barceló,et al.  IEEE 802.11AH: the WiFi approach for M2M communications , 2014, IEEE Wireless Communications.

[39]  David Lillethun,et al.  Mobile fog: a programming model for large-scale applications on the internet of things , 2013, MCC '13.

[40]  Dawei Li,et al.  DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices , 2017, AAAI.

[41]  Choong Seon Hong,et al.  Multi-agent and reinforcement learning based code offloading in mobile fog , 2016, 2016 International Conference on Information Networking (ICOIN).

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Frank Dürr,et al.  Optimal predictive code offloading , 2014, MobiQuitous.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[47]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[48]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.