Real to H-Space Autoencoders for Theme Identification in Telephone Conversations

Machine learning (ML) and deep learning with deep neural networks (DNN), have drastically improved the performances of modern systems on numerous spoken language understanding (SLU) related tasks. Since most of current researches focus on new neural architectures to enhance the performances in realistic conditions, few recent works investigated the use of different algebras with neural networks (NN), to better represent the nature of the data being processed. To this extent, quaternion-valued neural networks (QNN) have shown better performances, and an important reduction of the number of neural parameters compared to traditional real-valued neural networks, when dealing with multidimensional signal. Nonetheless, the use of QNNs is strictly limited to quaternion input or output features. This article introduces a new unsupervised method based on a hybrid autoencoder (AE) called real-to-quaternion autoencoder (R2H), to extract a quaternion-valued input signal from any real-valued data, to be processed by QNNs. The experiments performed to identify the most related theme of a given telephone conversation from a customer care service (CCS), demonstrate that the R2H approach outperforms all the previously established models, either real- or quaternion-valued ones, in term of accuracy and with up to four times fewer neural parameters.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  S. Leo,et al.  Local Hypercomplex Analyticity , 1997, funct-an/9703002.

[3]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[4]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[5]  Nobuyuki Matsui,et al.  Quaternionic Neural Networks: Fundamental Properties and Applications , 2009 .

[6]  Luigi Fortuna,et al.  Neural networks for quaternion-valued function approximation , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[7]  A. V. Olgac,et al.  Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks , 2011 .

[8]  Titouan Parcollet,et al.  Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[9]  Danilo P. Mandic,et al.  A Quaternion Gradient Operator and Its Applications , 2011, IEEE Signal Processing Letters.

[10]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[11]  Danilo P. Mandic,et al.  Quaternion-Valued Nonlinear Adaptive Filtering , 2011, IEEE Transactions on Neural Networks.

[12]  Mohamed Morchid,et al.  Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Titouan Parcollet,et al.  Deep quaternion neural networks for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[14]  Titouan Parcollet,et al.  Quaternion Recurrent Neural Networks , 2018, ICLR.

[15]  T. Nitta,et al.  A quaternary version of the back-propagation algorithm , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[16]  Anthony S. Maida,et al.  Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[17]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[18]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[19]  Titouan Parcollet,et al.  Bidirectional Quaternion Long Short-term Memory Recurrent Neural Networks for Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Zhi Chen,et al.  Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Nobuyuki Matsui,et al.  Quaternion Neural Network and Its Application , 2003, KES.

[22]  Mohamed Morchid,et al.  Integration of Word and Semantic Features for Theme Identification in Telephone Conversations , 2015, Natural Language Dialog Systems and Intelligent Assistants.

[23]  Akira Hirose,et al.  Generalization Characteristics of Complex-Valued Feedforward Neural Networks in Relation to Signal Coherence , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[24]  William Rowan Hamilton,et al.  XI. On quaternions; or on a new system of imaginaries in algebra , 1848 .

[25]  S. Sangwine Fourier transforms of colour images using quaternion or hypercomplex, numbers , 1996 .

[26]  Nobuyuki Matsui,et al.  Quaternion neural network with geometrical operators , 2004, J. Intell. Fuzzy Syst..

[27]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[28]  Nikos A. Aspragathos,et al.  A comparative study of three methods for robot kinematics , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[29]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[32]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[33]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[35]  Titouan Parcollet,et al.  Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations , 2017, INTERSPEECH.

[36]  Mohamed Morchid,et al.  Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions , 2015, CICLing.

[37]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[40]  Yang Liu,et al.  Recognizing Implicit Discourse Relations via Repeated Reading: Neural Networks with Multi-Level Attention , 2016, EMNLP.

[41]  Arun Narayanan,et al.  From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[42]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[43]  Vincent Van Asch,et al.  Macro-and micro-averaged evaluation measures [ [ BASIC DRAFT ] ] , 2013 .

[44]  Kobin H. Kendrick,et al.  Conversation Analysis , 2016 .

[45]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[46]  D. Mandic,et al.  Quaternion Valued Neural Networks and Nonlinear Adaptive Filters ∗ , 2010 .

[47]  Najim Dehak,et al.  Joint Verification-Identification in end-to-end Multi-Scale CNN Framework for Topic Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Yongqiang Wang,et al.  Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Hang Li,et al.  Cascaded Attention based Unsupervised Information Distillation for Compressive Summarization , 2017, EMNLP.

[50]  Shariq Mobin,et al.  Auditory Separation of a Conversation from Background via Attentional Gating , 2019, ArXiv.

[51]  Frédéric Béchet,et al.  Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks , 2016, LREC.

[52]  Zhou Yu,et al.  Enhancement and Analysis of Conversational Speech: JSALT 2017 , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Mohamed Morchid,et al.  Deep Stacked Autoencoders for Spoken Language Understanding , 2016, INTERSPEECH.

[54]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[55]  James Diebel,et al.  Representing Attitude : Euler Angles , Unit Quaternions , and Rotation Vectors , 2006 .

[56]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[57]  Ying Zhang,et al.  Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.

[58]  Soo-Chang Pei,et al.  Color image processing by using binary quaternion-moment-preserving thresholding technique , 1999, IEEE Trans. Image Process..

[59]  Giovanni Muscato,et al.  Multilayer Perceptrons to Approximate Quaternion Valued Functions , 1997, Neural Networks.

[60]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .