Recurrent Neural Networks for Edge Intelligence

Recurrent Neural Networks are ubiquitous and pervasive in many artificial intelligence applications such as speech recognition, predictive healthcare, creative art, and so on. Although they provide accurate superior solutions, they pose a massive challenge “training havoc.” Current expansion of IoT demands intelligent models to be deployed at the edge. This is precisely to handle increasing model sizes and complex network architectures. Design efforts to meet these for greater performance have had inverse effects on portability on edge devices with real-time constraints of memory, latency, and energy. This article provides a detailed insight into various compression techniques widely disseminated in the deep learning regime. They have become key in mapping powerful RNNs onto resource-constrained devices. While compression of RNNs is the main focus of the survey, it also highlights challenges encountered while training. The training procedure directly influences model performance and compression alongside. Recent advancements to overcome the training challenges with their strengths and drawbacks are discussed. In short, the survey covers the three-step process, namely, architecture selection, efficient training process, and suitable compression technique applicable to a resource-constrained environment. It is thus one of the comprehensive survey guides a developer can adapt for a time-series problem context and an RNN solution for the edge.

[1]  Richard Socher,et al.  Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.

[2]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[3]  Tarek F. Abdelzaher,et al.  DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework , 2017, SenSys.

[4]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[5]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[6]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[7]  Bill Dally,et al.  Efficient methods and hardware for deep learning , 2017, TiML '17.

[8]  Nicholas D. Lane,et al.  Squeezing Deep Learning into Mobile and Embedded Devices , 2017, IEEE Pervasive Computing.

[9]  Perry D. Moerland,et al.  Quantization and Pruning of Multilayer Perceptrons: Towards Compact Neural Networks , 1997 .

[10]  Zenglin Xu,et al.  Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition , 2018, AAAI.

[11]  Bing Xiang,et al.  WeNet: Weighted Networks for Recurrent Network Architecture Search , 2019, ArXiv.

[12]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[13]  S. Hewitt,et al.  2007 , 2018, Los 25 años de la OMC: Una retrospectiva fotográfica.

[14]  Christopher Joseph Pal,et al.  On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[15]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[16]  Kenji Doya,et al.  Bifurcations of Recurrent Neural Networks in Gradient Descent Learning , 1993 .

[17]  Vikash K. Mansinghka,et al.  Gen: a general-purpose probabilistic programming system with programmable inference , 2019, PLDI.

[18]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[19]  Ian McGraw,et al.  On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  C. Jyotsna,et al.  Deep Learning Approach for Suspicious Activity Detection from Surveillance Video , 2020, 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA).

[21]  James Bailey,et al.  Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections , 2016, ICML.

[22]  Ivan Oseledets,et al.  Expressive power of recurrent neural networks , 2017, ICLR.

[23]  Matthew Mattina,et al.  Compressing RNNs for IoT devices by 15-38x using Kronecker Products , 2019, ArXiv.

[24]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[25]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[26]  Erhardt Barth,et al.  Recurrent Dropout without Memory Loss , 2016, COLING.

[27]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[28]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[29]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[31]  S. C. Kremer,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[32]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[33]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[34]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[35]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[36]  Xiao-Tong Yuan,et al.  Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization , 2013, ICML.

[37]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[42]  Geoffrey E. Hinton,et al.  Temporal-Kernel Recurrent Neural Networks , 2010, Neural Networks.

[43]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[44]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[45]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[46]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[47]  Inderjit S. Dhillon,et al.  Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization , 2018, ICML.

[48]  Alexander Kmentt 2017 , 2018, The Treaty Prohibiting Nuclear Weapons.

[49]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[50]  Zenglin Xu,et al.  Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[52]  Alexander Novikov,et al.  Tensor Train decomposition on TensorFlow (T3F) , 2018, J. Mach. Learn. Res..

[53]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[54]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[55]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[56]  Stefan Braun,et al.  LSTM Benchmarks for Deep Learning Frameworks , 2018, ArXiv.

[57]  Bozhkov Lachezar,et al.  Echo State Network , 2017, Encyclopedia of Machine Learning and Data Mining.

[58]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[59]  Tobi Delbrück,et al.  DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator , 2018, FPGA.

[60]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[61]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[62]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[63]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[64]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[65]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[66]  Kristie B. Hadden,et al.  2020 , 2020, Journal of Surgical Orthopaedic Advances.

[67]  Yuxiong He,et al.  AntMan: Sparse Low-Rank Compression to Accelerate RNN inference , 2019, ArXiv.

[68]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[69]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[70]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[71]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[72]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[73]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[74]  Tamara G. Kolda,et al.  Software for Sparse Tensor Decomposition on Emerging Computing Architectures , 2018, SIAM J. Sci. Comput..

[75]  Andreas W. Kempa-Liehr,et al.  Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package) , 2018, Neurocomputing.

[76]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[77]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[78]  Mosharaf Chowdhury,et al.  No!: Not Another Deep Learning Framework , 2017, HotOS.

[79]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[80]  Yasuhiro Fujiwara,et al.  Preventing Gradient Explosions in Gated Recurrent Units , 2017, NIPS.

[81]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[82]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[83]  Gunnar Rätsch,et al.  Learning Unitary Operators with Help From u(n) , 2016, AAAI.

[84]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[85]  Jeff Pool,et al.  Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip , 2018, ICLR.

[86]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[87]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[88]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[89]  Paul Lukowicz,et al.  Collecting complex activity datasets in highly rich networked sensor environments , 2010, 2010 Seventh International Conference on Networked Sensing Systems (INSS).

[90]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[91]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[92]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[94]  Fang Liu,et al.  Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.

[95]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[96]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[97]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[98]  Arnaud Destrebecqz,et al.  Incremental sequence learning , 1996 .

[99]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[100]  Qinru Qiu,et al.  C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs , 2018, FPGA.

[101]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[102]  Zenglin Xu,et al.  TensorD: A tensor decomposition library in TensorFlow , 2018, Neurocomputing.

[103]  Pascale Fung,et al.  Towards Empathetic Human-Robot Interactions , 2016, CICLing.

[104]  Deepa Gupta,et al.  Deep Learning Model for Text Recognition in Images , 2019, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[105]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[106]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[107]  Matthew Sotoudeh,et al.  DeepThin: A Self-Compressing Library for Deep Neural Networks , 2018, ArXiv.

[108]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[109]  Andrey V. Savchenko,et al.  Neural Networks Compression for Language Modeling , 2017, PReMI.

[110]  Tian Lin,et al.  Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling , 2018 .

[111]  Liqing Zhang,et al.  Tensor Ring Decomposition , 2016, ArXiv.

[112]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[113]  Robert M. Gray,et al.  Toeplitz And Circulant Matrices: A Review (Foundations and Trends(R) in Communications and Information Theory) , 2006 .

[114]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[115]  Richard Socher,et al.  A Flexible Approach to Automated RNN Architecture Generation , 2017, ICLR.

[116]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[117]  Yisong Yue,et al.  Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.

[118]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[119]  Qing Lei,et al.  A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.

[120]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[121]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[122]  Anton Rodomanov,et al.  Putting MRFs on a Tensor Train , 2014, ICML.

[123]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[124]  Yike Guo,et al.  TensorLayer: A Versatile Library for Efficient Deep Learning Development , 2017, ACM Multimedia.

[125]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[126]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[127]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[128]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[129]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[130]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[131]  Sujith Ravi,et al.  ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections , 2017, ArXiv.

[132]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[133]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[134]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[135]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[136]  Shuicheng Yan,et al.  Training Skinny Deep Neural Networks with Iterative Hard Thresholding Methods , 2016, ArXiv.

[137]  Gregory Frederick Diamos,et al.  Block-Sparse Recurrent Neural Networks , 2017, ArXiv.

[138]  Herbert Jaeger,et al.  Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[139]  P SomanK.,et al.  Single Sensor Techniques for Sleep Apnea Diagnosis Using Deep Learning , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[140]  Tara N. Sainath,et al.  Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[141]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[142]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.