Processing LSTM in memory using hybrid network expansion model

With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM) units. Our work shows that the gates and cells in LSTM can be partitioned to fundamental operations and then recombined and mapped into heterogeneous computing components. The experimental results show that, implemented on 45nm CMOS process, the proposed RPU with size of 1.51 mm2 and power of 413 mw achieves 309 GOPS/W in power efficiency, and is 1.7 χ better than state-of-the-art reconfigurable architecture.

[1]  Olga Radyvonenko,et al.  Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization , 2016, 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP).

[2]  Yu Wang,et al.  Large scale recurrent neural network on GPU , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[3]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  Yu Wang,et al.  FPGA Acceleration of Recurrent Neural Network Based Language Model , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  Eriko Nurvitadhi,et al.  Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[9]  Berin Martini,et al.  Recurrent Neural Networks Hardware Implementation on FPGA , 2015, ArXiv.

[10]  Samyam Rajbhandari,et al.  LONG SHORT-TERM MEMORY , 2018 .

[11]  Faa-Jeng Lin,et al.  FPGA-Based Recurrent Wavelet Neural Network Control System for Linear Ultrasonic Motor , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[12]  Sungwook Choi,et al.  FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks , 2016, 2016 IEEE International Workshop on Signal Processing Systems (SiPS).