论文信息 - Deep quaternion neural networks for spoken language understanding

Deep quaternion neural networks for spoken language understanding

Deep Neural Networks (DNN) received a great interest from researchers due to their capability to construct robust abstract representations of heterogeneous documents in a latent subspace. Nonetheless, mere real-valued deep neural networks require an appropriate adaptation, such as the convolution process, to capture latent relations between input features. Moreover, real-valued deep neural networks reveal little in way of document internal dependencies, by only considering words or topics contained in the document as an isolate basic element. Quaternion-valued multi-layer per-ceptrons (QMLP), and autoencoders (QAE) have been introduced to capture such latent dependencies, alongside to represent multidimensional data. Nonetheless, a three-layered neural network does not benefit from the high abstraction capability of DNNs. The paper proposes first to extend the hyper-complex algebra to deep neural networks (QDNN) and, then, introduces pre-trained deep quaternion neural networks (QDNN-AE) with dedicated quaternion encoder-decoders (QAE). The experiments conduced on a theme identification task of spoken dialogues from the DECODA data set show, inter alia, that the QDNN-AE reaches a promising gain of 2.2% compared to the standard real-valued DNN-AE.

[1] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2] Georges Linarès,et al. The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[3] J. P. Ward. Quaternions and Cayley Numbers: Algebra and Applications , 1997 .

[4] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[5] Yuichi Nakamura,et al. Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[6] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[7] Giovanni Muscato,et al. Multilayer Perceptrons to Approximate Quaternion Valued Functions , 1997, Neural Networks.

[8] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[9] A. S. Solodovnikov,et al. Hypercomplex Numbers: An Elementary Introduction to Algebras , 1989 .

[10] Mohamed Morchid,et al. Theme identification in telephone service conversations using quaternions of speech features , 2013, INTERSPEECH.

[11] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15] J. Kuipers. Quaternions and Rotation Sequences , 1998 .

[16] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[19] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[20] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[21] Nobuyuki Matsui,et al. Quaternion Neural Network and Its Application , 2003, KES.

[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24] Georgios C. Anagnostopoulos,et al. Knowledge-Based Intelligent Information and Engineering Systems , 2003, Lecture Notes in Computer Science.

[25] Frédéric Béchet,et al. DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[26] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[29] Nobuyuki Matsui,et al. Feed forward neural network with random quaternionic neurons , 2017, Signal Process..

[30] Titouan Parcollet,et al. Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[31] Fuzhen Zhang. Quaternions and matrices of quaternions , 1997 .

[32] Mohamed Morchid,et al. Deep Stacked Autoencoders for Spoken Language Understanding , 2016, INTERSPEECH.

[33] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[34] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.