Auto-encodeurs pour la compréhension de documents parlés (Auto-encoders for Spoken Document Understanding)

Auto-encoders for Spoken Document Understanding Document representations based on neural embedding frameworks have recently shown significant improvements in different natural Language processing tasks. In the context of real application framework, the automatic transcription of spoken documents may result in several word errors, especially when very noisy conditions are encountered. This paper proposes an original representation of highly imperfect spoken documents based on the bottleneck features from a Supervised Deep autoencodeur that takes advantage of both noisy automatic and clean manual transcriptions to improve the robustness of the document representation in a noisy environment. Results obtained on the DECODA theme classification task of dialogues reach an accuracy of more than 83% with a significant gain of about 6%. MOTS-CLÉS : auto-encodeur, débruitage, reconnaissance de la parole, réseaux de neurones.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Dong Yu,et al.  Language recognition using deep-structured conditional random fields , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Rodrigo Torres,et al.  Feature Learning Using Stacked Autoencoders to Predict the Activity of Antimicrobial Peptides , 2015, CMSB.

[6]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[7]  Michael A. Casey,et al.  Musical Audio Synthesis Using Autoencoding Neural Nets , 2014, ICMC.

[8]  Pascal Vincent,et al.  GSNs : Generative Stochastic Networks , 2015, ArXiv.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Olivier Sigaud,et al.  Gated Autoencoders with Tied Input Weights , 2013, ICML.

[11]  Jean-Luc Gauvain,et al.  CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content , 2008, LREC.

[12]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[13]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .