Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference

Recurrent Neural Networks (RNNs) such as those based on the Long-Short Term Memory (LSTM) architecture are state-of-the-art deep learning models for sequence analysis. Given the complexity of RNN-based inference, IoT devices typically offload this task to a cloud server. However, the complexity of RNN inference strongly depends on the length of the processed input sequence. Therefore, when communication time is taken into account, it may be more convenient to process short input sequences locally and only offload long ones to the cloud. In this paper, we propose a low-overhead runtime tool that performs this decision automatically. Results based on performance profiling of real edge and cloud devices show that our method is able to reduce the total execution time of the system by up to 20% compared to solutions that execute the RNN inference fully locally or fully in the cloud.

[1]  Tajana Simunic,et al.  Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[2]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[3]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Jaeha Kung,et al.  Peregrine: A FIexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[5]  Yehia El-khatib,et al.  Adaptive deep learning model selection on embedded systems , 2018, LCTES.

[6]  Enrico Macii,et al.  Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.

[7]  Enrico Macii,et al.  Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[10]  Massimo Poncino,et al.  Application-Driven Synthesis of Energy-Efficient Reconfigurable-Precision Operators , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[11]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[12]  Enrico Macii,et al.  A methodology for the design of dynamic accuracy operators by runtime back bias , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[13]  Niraj K. Jha,et al.  A Hierarchical Inference Model for Internet-of-Things , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[14]  Chen Zhang,et al.  Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.

[15]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.