M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding

Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neural networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations, it is common to hold both a human, manual (TRS) and an automatically transcribed (ASR) versions of the conversations. Nonetheless, and due to production constraints, only the ASR transcripts are considered to build automatic classifiers. TRS transcripts are only used to measure the performances of ASR systems. Moreover, the recent performances in term of classification accuracy, obtained by DNN related systems are close to the performances reached by humans, and it becomes difficult to further increase the performances by only considering the ASR transcripts. This paper proposes to distillates the TRS knowledge available during the training phase within the ASR representation, by using a new generative adversarial network called M2H-GAN to generate a TRS-like version of an ASR document, to improve the theme identification performances.

[1]  Mohamed Morchid,et al.  Deep Stacked Autoencoders for Spoken Language Understanding , 2016, INTERSPEECH.

[2]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[3]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[4]  Titouan Parcollet,et al.  Quaternion Convolutional Neural Networks For Theme Identification Of Telephone Conversations , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[5]  Mohamed Morchid,et al.  Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Titouan Parcollet,et al.  Deep quaternion neural networks for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[7]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[9]  Yongqiang Wang,et al.  Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[11]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[12]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[13]  Titouan Parcollet,et al.  Quaternion Neural Networks for Spoken Language Understanding , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[14]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[15]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Titouan Parcollet,et al.  Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations , 2017, INTERSPEECH.

[18]  Mohamed Morchid,et al.  Theme identification in telephone service conversations using quaternions of speech features , 2013, INTERSPEECH.

[19]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Anna Rumshisky,et al.  Adversarial Text Generation Without Reinforcement Learning , 2018, ArXiv.

[22]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Wei Chen,et al.  Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets , 2017, NAACL.

[26]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[27]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[29]  Augustus Odena,et al.  Semi-Supervised Learning with Generative Adversarial Networks , 2016, ArXiv.