论文信息 - Acceleration of Deep Recurrent Neural Networks with an FPGA cluster

Acceleration of Deep Recurrent Neural Networks with an FPGA cluster

In this paper, we propose an acceleration methodology for deep recurrent neural networks (RNNs) implemented on a multi-FPGA platform called Flow-in-Cloud (FiC). RNNs have been proven effective for modeling temporal sequences, such as human speech and written text. However, the implementation of RNNs on traditional hardware is inefficient due to their long-range dependence and irregular computation patterns. This inefficiency manifests itself in the proportional increase of run time with respect to the number of layers of deep RNNs when running on traditional hardware platforms such as a CPUs. Previous works have mostly focused on the optimization of a single RNN cell. In this work, we take advantage of the multi-FPGA system to demonstrate that we can reduce the run time of deep RNNs from O(k) to O(1).

Hideharu Amano | Akram Ben Ahmed | YuXi Sun | H. Amano | Yuxi Sun

[1] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2] Hideharu Amano,et al. Deep Learning on High Performance FPGA Switching Boards: Flow-in-Cloud , 2018, ARC.

[3] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[4] Benjamin Schrauwen,et al. Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[5] Song Han,et al. ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA , 2016, ArXiv.

[6] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[7] Qinru Qiu,et al. C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs , 2018, FPGA.

[8] Andrew W. Senior,et al. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[9] Tobi Delbrück,et al. DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator , 2018, FPGA.