Input-Dependent Edge-Cloud Mapping of Recurrent Neural Networks Inference

Given the computational complexity of Recurrent Neural Networks (RNNs) inference, IoT and mobile devices typically offload this task to the cloud. However, the execution time and energy consumption of RNN inference strongly depends on the length of the processed input. Therefore, considering also communication costs, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this choice automatically. Results based on real edge and cloud devices show that our method is able to simultaneously reduce the total execution time and energy consumption of the system compared to solutions that run RNN inference fully locally or fully in the cloud.

[1]  Jaeha Kung,et al.  Peregrine: A FIexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[3]  Enrico Macii,et al.  Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.

[4]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[5]  Thierry Chonavel,et al.  Statistical Characterization of Round-Trip Times with Nonparametric Hidden Markov Models , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[6]  Yehia El-khatib,et al.  Adaptive deep learning model selection on embedded systems , 2018, LCTES.

[7]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[8]  Enrico Macii,et al.  Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[9]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[10]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[11]  Ramesh Govindan,et al.  Odessa: enabling interactive perception applications on mobile devices , 2011, MobiSys '11.

[12]  Niraj K. Jha,et al.  A Hierarchical Inference Model for Internet-of-Things , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[13]  Massoud Pedram,et al.  BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[14]  Massimo Poncino,et al.  Application-Driven Synthesis of Energy-Efficient Reconfigurable-Precision Operators , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[15]  Tajana Simunic,et al.  Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[16]  Chen Zhang,et al.  Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.