Error Analysis Applied to End-to-End Spoken Language Understanding

This paper presents a qualitative study of errors produced by an end-to-end spoken language understanding (SLU) system (speech signal to concepts) that reaches state of the art performance. Different studies are proposed to better understand the weaknesses of such systems: comparison to a classical pipeline SLU system, a study on the cause of concept deletions (the most frequent error), observation of a problem in the capability of the end-to-end SLU system to segment correctly concepts, analysis of the system behavior to process unseen concept/value pairs, analysis of the benefit of the curriculum-based transfer learning approach. Last, we proposed a way to compute embeddings of sub-sequences that seem to contain relevant information for future work.

[1]  Srinivas Bangalore,et al.  Spoken Language Understanding without Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jianfeng Gao,et al.  A Nested Attention Neural Hybrid Model for Grammatical Error Correction , 2017, ACL.

[3]  Yannick Estève,et al.  Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech , 2019, INTERSPEECH.

[4]  Alexander M. Rush,et al.  Adapting Sequence Models for Sentence Correction , 2017, EMNLP.

[5]  Kai Yu,et al.  Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Renato De Mori,et al.  ASR Error Management for Improving Spoken Language Understanding , 2017, INTERSPEECH.

[7]  Frédéric Béchet,et al.  The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems , 2004, LREC.

[8]  Yannick Estève,et al.  End-To-End Named Entity And Semantic Concept Extraction From Speech , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[9]  Askars Salimbajevs,et al.  Error Analysis and Improving Speech Recognition for Latvian Language , 2015, RANLP.

[10]  Isabelle Augenstein,et al.  Numerically Grounded Language Models for Semantic Error Correction , 2016, EMNLP.

[11]  Yoshua Bengio,et al.  Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.

[12]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[13]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[14]  Guillaume Gravier,et al.  Is it time to Switch to word embedding and recurrent neural networks for spoken language understanding? , 2015, INTERSPEECH.

[15]  Yannick Estève,et al.  Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability , 2019, INTERSPEECH.