论文信息 - FPGA-based accelerator for long short-term memory recurrent neural networks

FPGA-based accelerator for long short-term memory recurrent neural networks

Long Short-Term Memory Recurrent neural networks (LSTM-RNNs) have been widely used for speech recognition, machine translation, scene analysis, etc. Unfortunately, general-purpose processors like CPUs and GPGPUs can not implement LSTM-RNNs efficiently due to the recurrent nature of LSTM-RNNs. FPGA-based accelerators have attracted attention of researchers because of good performance, high energy-efficiency and great flexibility. In this work, we present an FPGA-based accelerator for LSTM-RNNs that optimizes both computation performance and communication requirements. The peak performance of our accelerator achieves 7.26 GFLOP/S, which significantly outperforms previous approaches.

[1] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[2] K. M. Curtis,et al. Piecewise linear approximation applied to nonlinear function of a neural network , 1997 .

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] H.-J. Pfleiderer,et al. Towards an efficient hardware implementation of recurrent neural network based multiuser detection , 2000, 2000 IEEE Sixth International Symposium on Spread Spectrum Techniques and Applications. ISSTA 2000. Proceedings (Cat. No.00TH8536).

[5] Yutaka Maeda,et al. Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation , 2005, IEEE Transactions on Neural Networks.

[6] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.

[7] Julian Togelius,et al. Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[8] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] D. Bokal,et al. Transforming the LSTM training algorithm for efficient FPGA-based adaptive control of nonlinear dynamic systems , 2013 .

[10] Qing Wu,et al. A Parallel Neuromorphic Text Recognition System and Its Implementation on a Heterogeneous High-Performance Computing Cluster , 2013, IEEE Transactions on Computers.

[11] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[12] Hermann Ney,et al. Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[13] Andreas Zell,et al. Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning , 2014, ICANN.

[14] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15] Berin Martini,et al. Recurrent Neural Networks Hardware Implementation on FPGA , 2015, ArXiv.

[16] Yu Wang,et al. FPGA Acceleration of Recurrent Neural Network Based Language Model , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[17] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[18] Marcus Liwicki,et al. Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging , 2015, Pattern Recognit. Lett..

[19] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.