论文信息 - Time-step interleaved weight reuse for LSTM neural network computing

Time-step interleaved weight reuse for LSTM neural network computing

In Long Short-Term Memory (LSTM) neural network models, a weight matrix tends to be repeatedly loaded from DRAM if the size of on-chip storage of the processor is not large enough to store the entire matrix. To alleviate heavy overhead of DRAM access for weight loading in LSTM computations, we propose a weight reuse scheme which utilizes the weight sharing characteristics in two adjacent time-step computations. Experimental results show that the proposed weight reuse scheme reduces the energy consumption by 28.4-57.3% and increases the overall throughput by 110.8% compared to the conventional schemes.

Yulhwa Kim | Jae-Joon Kim | Daehyun Ahn | Taesu Kim | Naebeom Park

[1] Li Tian,et al. Efficient Weight Reuse for Large LSTMs , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[2] Zhongfeng Wang,et al. E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[3] Jing Wang,et al. Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4] Jae-Joon Kim,et al. Maximizing system performance by balancing computation loads in LSTM accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[6] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[8] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[9] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[10] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Jung Ho Ahn,et al. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .