DC-LSTM: Deep Compressed LSTM with Low Bit-Width and Structured Matrices

Long Short-Term Memory (LSTM) has been widely adopted in many sequential applications, such as language model and speech recognition. LSTM usually incurs in a large memory requirement and high computational complexity. Therefore, LSTM has a limited applicability to embedded and mobile systems. In LSTM, a large number of operations and high storage are required for matrix-vector multiplication (MV). In this paper, we present a software and hardware co-design scheme for efficiently compressing MVs. By utilizing a structured matrix, quantization and selective top-k pruning, memory requirements are substantially reduced while only incurring in a negligible accuracy loss. Then, a block-parallel hardware architecture is proposed for the compressed LSTM. As requiring less multiplication operations and storage resources, the proposed architecture achieves the very good compression ratio. The proposed architecture is implemented on the Xilinx VCU118 and KC705 platforms. Experimental results show that the proposed design uses less DSP and BRAM resources.