FPGA-based accelerator for long short-term memory recurrent neural networks

Long Short-Term Memory Recurrent neural networks (LSTM-RNNs) have been widely used for speech recognition, machine translation, scene analysis, etc. Unfortunately, general-purpose processors like CPUs and GPGPUs can not implement LSTM-RNNs efficiently due to the recurrent nature of LSTM-RNNs. FPGA-based accelerators have attracted attention of researchers because of good performance, high energy-efficiency and great flexibility. In this work, we present an FPGA-based accelerator for LSTM-RNNs that optimizes both computation performance and communication requirements. The peak performance of our accelerator achieves 7.26 GFLOP/S, which significantly outperforms previous approaches.

[1]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[2]  K. M. Curtis,et al.  Piecewise linear approximation applied to nonlinear function of a neural network , 1997 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  H.-J. Pfleiderer,et al.  Towards an efficient hardware implementation of recurrent neural network based multiuser detection , 2000, 2000 IEEE Sixth International Symposium on Spread Spectrum Techniques and Applications. ISSTA 2000. Proceedings (Cat. No.00TH8536).

[5]  Yutaka Maeda,et al.  Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation , 2005, IEEE Transactions on Neural Networks.

[6]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[7]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  D. Bokal,et al.  Transforming the LSTM training algorithm for efficient FPGA-based adaptive control of nonlinear dynamic systems , 2013 .

[10]  Qing Wu,et al.  A Parallel Neuromorphic Text Recognition System and Its Implementation on a Heterogeneous High-Performance Computing Cluster , 2013, IEEE Transactions on Computers.

[11]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[12]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[13]  Andreas Zell,et al.  Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning , 2014, ICANN.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Berin Martini,et al.  Recurrent Neural Networks Hardware Implementation on FPGA , 2015, ArXiv.

[16]  Yu Wang,et al.  FPGA Acceleration of Recurrent Neural Network Based Language Model , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[17]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[18]  Marcus Liwicki,et al.  Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging , 2015, Pattern Recognit. Lett..

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.