CRIME: Input-Dependent Collaborative Inference for Recurrent Neural Networks

The excellent accuracy of Recurrent Neural Networks (RNNs) for time-series and natural language processing comes at the cost of computational complexity. Therefore, the choice between edge and cloud computing for RNN inference, with the goal of minimizing response time or energy consumption, is not trivial. An edge approach must deal with the aforementioned complexity, while a cloud solution pays large time and energy costs for data transmission. Collaborative inference is a technique that tries to obtain the best of both worlds, by splitting the inference task among a network of collaborating devices. While already investigated for other types of neural networks, collaborative inference for RNNs poses completely new challenges, such as the strong influence of input length on processing time and energy, and is greatly unexplored. In this article, we introduce a Collaborative RNN Inference Mapping Engine (CRIME), which automatically selects the best inference device for each input. CRIME is flexible with respect to the connection topology among collaborating devices, and adapts to changes in the connections statuses and in the devices loads. With experiments on several RNNs and datasets, we show that CRIME can reduce the execution time (or end-node energy) by more than 25 percent compared to any single-device approach.

[1]  Feng Qian,et al.  A close examination of performance and power characteristics of 4G LTE networks , 2012, MobiSys '12.

[2]  Chen Zhang,et al.  Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.

[3]  Alec Wolman,et al.  MAUI: making smartphones last longer with code offload , 2010, MobiSys '10.

[4]  Niranjan Balasubramanian,et al.  MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU , 2017, EMDL '17.

[5]  Tobi Delbrück,et al.  EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference , 2019, 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[6]  Bo Hu,et al.  FoggyCache: Cross-Device Approximate Computation Reuse , 2018, MobiCom.

[7]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Massimo Poncino,et al.  Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search , 2020, Electronics.

[9]  Lingfan Yu,et al.  Low latency RNN inference with cellular batching , 2018, EuroSys.

[10]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[11]  Feng Wang,et al.  DeePar: A Hybrid Device-Edge-Cloud Execution Framework for Mobile Deep Learning Applications , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[12]  Zhi Zhou,et al.  Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[13]  Jaeha Kung,et al.  Peregrine: A FIexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[14]  Paramvir Bahl,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015, SenSys.

[15]  Xiaofei Wang,et al.  Convergence of Edge Computing and Deep Learning: A Comprehensive Survey , 2019, IEEE Communications Surveys & Tutorials.

[16]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[17]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[18]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[19]  Archan Misra,et al.  Breathing-Based Authentication on Resource-Constrained IoT Devices using Recurrent Neural Networks , 2018, Computer.

[20]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[21]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[22]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[23]  Ying Zhang,et al.  Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.

[24]  Sunggu Lee,et al.  Similarity-Based LSTM Architecture for Energy-Efficient Edge-Level Speech Recognition , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[25]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[26]  Enrico Macii,et al.  Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[27]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[28]  Bo Li,et al.  LSTM-Based Analysis of Industrial IoT Equipment , 2018, IEEE Access.

[29]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[30]  Nishant Wadhwani,et al.  IOT Based Biomedical Wireless Sensor Networks and Machine Learning Algorithms for Detection of Diseased Conditions , 2019, 2019 Innovations in Power and Advanced Computing Technologies (i-PACT).

[31]  Jason Cong,et al.  FPGA-based accelerator for long short-term memory recurrent neural networks , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[32]  Soo-Mook Moon,et al.  Enhanced Partitioning of DNN Layers for Uploading from Mobile Devices to Edge Servers , 2019, EMDL '19.

[33]  Tarek M. Taha,et al.  Accelerating Inference In Long Short-Term Memory Neural Networks , 2018, NAECON 2018 - IEEE National Aerospace and Electronics Conference.

[34]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[35]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[36]  Tajana Simunic,et al.  Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[37]  Yehia El-khatib,et al.  Adaptive deep learning model selection on embedded systems , 2018, LCTES.

[38]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[39]  Enrico Macii,et al.  Input-Dependent Edge-Cloud Mapping of Recurrent Neural Networks Inference , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[40]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[41]  Yuxiong He,et al.  GRNN: Low-Latency and Scalable RNN Inference on GPUs , 2019, EuroSys.

[42]  Thierry Chonavel,et al.  Statistical Characterization of Round-Trip Times with Nonparametric Hidden Markov Models , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[43]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[44]  Massoud Pedram,et al.  JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.

[45]  Enrico Macii,et al.  Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.

[46]  Ran Li,et al.  Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN , 2018, IEEE Transactions on Smart Grid.

[47]  Massoud Pedram,et al.  BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[48]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[49]  Katherine Guo,et al.  Cachier: Edge-Caching for Recognition Applications , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[50]  Md. Zia Uddin A wearable sensor-based activity prediction system to facilitate edge computing in smart healthcare system , 2019, J. Parallel Distributed Comput..

[51]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[52]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[53]  Ramesh Govindan,et al.  Odessa: enabling interactive perception applications on mobile devices , 2011, MobiSys '11.

[54]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[55]  Soo-Mook Moon,et al.  IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers , 2018, SoCC.

[56]  Enrico Macii,et al.  Energy-Efficient Digital Processing via Approximate Computing , 2016 .