论文信息 - Bidirectional Quaternion Long Short-term Memory Recurrent Neural Networks for Speech Recognition

Bidirectional Quaternion Long Short-term Memory Recurrent Neural Networks for Speech Recognition

Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long short-term memory (LSTM) recurrent neural networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of inter-dependent features. Nonetheless, internal dependencies within the element composing multidimensional features are weakly considered by traditional real-valued representations. We propose a novel quaternion long short-term memory (QL-STM) recurrent neural network that takes into account both the external relations between the features composing a sequence, and these internal latent structural dependencies with the quaternion algebra. QLSTMs are compared to LSTMs during a memory copy-task and a realistic application of speech recognition on the Wall Street Journal (WSJ) dataset. QLSTM reaches better performances during the two experiments with up to 2.8 times less learning parameters, leading to a more expressive representation of the information.

[1] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[2] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3] Giovanni Muscato,et al. Multilayer Perceptrons to Approximate Quaternion Valued Functions , 1997, Neural Networks.

[4] Anthony S. Maida,et al. Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[5] Titouan Parcollet,et al. Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations , 2017, INTERSPEECH.

[6] Alex Graves,et al. Associative Long Short-Term Memory , 2016, ICML.

[7] Luigi Fortuna,et al. Neural networks for quaternion-valued function approximation , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[8] Nikos A. Aspragathos,et al. A comparative study of three methods for robot kinematics , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[9] Titouan Parcollet,et al. Deep quaternion neural networks for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[10] Titouan Parcollet,et al. Quaternion Recurrent Neural Networks , 2018, ICLR.

[11] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[12] Ying Zhang,et al. Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.

[13] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[14] Stephen Grossberg,et al. Recurrent neural networks , 2013, Scholarpedia.

[15] Nobuyuki Matsui,et al. Feed forward neural network with random quaternionic neurons , 2017, Signal Process..

[16] Stefan C. Kremer,et al. Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[17] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18] S. Sangwine. Fourier transforms of colour images using quaternion or hypercomplex, numbers , 1996 .

[19] Titouan Parcollet,et al. Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[20] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[21] Dongpo Xu,et al. Learning Algorithms in Quaternion Neural Networks Using GHR Calculus , 2017 .

[22] William Chan,et al. Deep Recurrent Neural Networks for Acoustic Modelling , 2015, ArXiv.