Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models

We aim to improve a speech understanding module with a small amount of training data. A speech understanding module uses a language model (LM) and a language understanding model (LUM). A lot of training data are needed to improve the models. Such data collection is, however, difficult in an actual process of development. We therefore design and develop a new framework that uses multiple LMs and LUMs to improve speech understanding accuracy under various amounts of training data. Even if the amount of available training data is small, each LM and each LUM can deal well with different types of utterances and more utterances are understood by using multiple LM and LUM. As one implementation of the framework, we develop a method for selecting the most appropriate speech understanding result from several candidates. The selection is based on probabilities of correctness calculated by logistic regressions. We evaluate our framework with various amounts of training data. Index Terms: speech understanding, multiple language models and language understanding models, limited training data

[1]  Tetsuya Ogata,et al.  Rapid Prototyping of Robust Language Understanding Modules for Spoken Dialogue Systems , 2008, IJCNLP.

[2]  Heinrich Niemann,et al.  Combining stochastic and linguistic language models for recognition of spontaneous speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Mikio Nakano,et al.  A Framework for Building Conversational Agents Based on a Multi-Expert Model , 2008, SIGDIAL Workshop.

[4]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[5]  Kiyohiro Shikano,et al.  Recent progress of open-source LVCSR engine julius and Japanese model repository , 2004, INTERSPEECH.

[6]  Tetsuya Ogata,et al.  A Speech Understanding Framework that Uses Multiple Language Models and Multiple Understanding Models , 2009, HLT-NAACL.

[7]  Tatsuya Kawahara,et al.  Flexible Mixed-Initiative Dialogue Management using Concept-Level Confidence Measures of Speech Recognizer Output , 2000, COLING.

[8]  Hermann Ney,et al.  System combination for spoken language understanding , 2008, INTERSPEECH.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Kenji Araki,et al.  Analysis of User Reactions to Turn-Taking Failures in Spoken Dialogue Systems , 2007, SIGdial.