Time-step interleaved weight reuse for LSTM neural network computing

In Long Short-Term Memory (LSTM) neural network models, a weight matrix tends to be repeatedly loaded from DRAM if the size of on-chip storage of the processor is not large enough to store the entire matrix. To alleviate heavy overhead of DRAM access for weight loading in LSTM computations, we propose a weight reuse scheme which utilizes the weight sharing characteristics in two adjacent time-step computations. Experimental results show that the proposed weight reuse scheme reduces the energy consumption by 28.4-57.3% and increases the overall throughput by 110.8% compared to the conventional schemes.

[1]  Li Tian,et al.  Efficient Weight Reuse for Large LSTMs , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[2]  Zhongfeng Wang,et al.  E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[3]  Jing Wang,et al.  Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Jae-Joon Kim,et al.  Maximizing system performance by balancing computation loads in LSTM accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[6]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[8]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[9]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[10]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Jung Ho Ahn,et al.  CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .