Quaternion Convolutional Neural Networks For Theme Identification Of Telephone Conversations

Quaternion convolutional neural networks (QCNN) are powerful architectures to learn and model external dependencies that exist between neighbor features of an input vector, and internal latent dependencies within the feature. This paper proposes to evaluate the effectiveness of the QCNN on a realistic theme identification task of spoken telephone conversations between agents and customers from the call center of the Paris transportation system (RATP). We show that QCNNs are more suitable than real-valued CNN to process multidimensional data and to code internal dependencies. Indeed, real-valued CNNs deal with both internal and external relations at the same level since components of an entity are processed independently. Experimental evidence is provided that the proposed QCNN architecture always outperforms real-valued equivalent CNN models in the theme identification task of the DECODA corpus. It is also shown that QCNN accuracy results are the best achieved so far on this task, while reducing by a factor of 4 the number of model parameters.

[1]  Titouan Parcollet,et al.  Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations , 2017, INTERSPEECH.

[2]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[5]  Mohamed Morchid,et al.  Theme identification in telephone service conversations using quaternions of speech features , 2013, INTERSPEECH.

[6]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[7]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[8]  Titouan Parcollet,et al.  Deep quaternion neural networks for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[9]  Titouan Parcollet,et al.  Quaternion Recurrent Neural Networks , 2018, ICLR.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Titouan Parcollet,et al.  Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[13]  Dongpo Xu,et al.  Learning Algorithms in Quaternion Neural Networks Using GHR Calculus , 2017 .

[14]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Timothy J. Hazen,et al.  Topic identification from audio recordings using word and phone recognition lattices , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[18]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[19]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[20]  S. Sangwine Fourier transforms of colour images using quaternion or hypercomplex, numbers , 1996 .

[21]  Nobuyuki Matsui,et al.  Feed forward neural network with random quaternionic neurons , 2017, Signal Process..

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Soo-Chang Pei,et al.  Color image processing by using binary quaternion-moment-preserving thresholding technique , 1999, IEEE Trans. Image Process..

[24]  Giovanni Muscato,et al.  Multilayer Perceptrons to Approximate Quaternion Valued Functions , 1997, Neural Networks.

[25]  Nikos A. Aspragathos,et al.  A comparative study of three methods for robot kinematics , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[26]  T. Nitta,et al.  A quaternary version of the back-propagation algorithm , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[27]  Anthony S. Maida,et al.  Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[28]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[29]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[30]  Mohamed Morchid,et al.  Deep Stacked Autoencoders for Spoken Language Understanding , 2016, INTERSPEECH.

[31]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[32]  Luigi Fortuna,et al.  Neural networks for quaternion-valued function approximation , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[33]  Ying Zhang,et al.  Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.

[34]  Nobuyuki Matsui,et al.  Quaternion Neural Network and Its Application , 2003, KES.