Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations

SUMMARY The optimal way to build speech understanding modules depends on the amount of training data available. When only a small amount of training data is available, effective allocation of the data is crucial to preventing overfitting of statistical methods. We have developed a method for allocating a limited amount of training data in accordance with the amount available. Our method exploits rule-based methods for when the amount of data is small, which are included in our speech understanding framework based on multiple model combinations, i.e., multiple automatic speech recognition (ASR) modules and multiple language understanding (LU) modules, and then allocates training data preferentially to the modules that dominate the overall performance of speech understanding. Experimental evaluation showed that our allocation method consistently outperforms baseline methods that use a single ASR module and a single LU

[1]  Robert E. Schapire,et al.  Boosting with prior knowledge for call classification , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[3]  Alexander H. Waibel,et al.  Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition , 1998, ACL.

[4]  Kiyohiro Shikano,et al.  Recent progress of open-source LVCSR engine julius and Japanese model repository , 2004, INTERSPEECH.

[5]  Tatsuya Kawahara,et al.  Flexible Mixed-Initiative Dialogue Management using Concept-Level Confidence Measures of Speech Recognizer Output , 2000, COLING.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Gary Geunbae Lee,et al.  Exploiting Non-Local Features for Spoken Language Understanding , 2006, ACL.

[8]  I. Lee Hetherington The MIT finite-state transducer toolkit for speech and language processing , 2004, INTERSPEECH.

[9]  Kenji Araki,et al.  Analysis of User Reactions to Turn-Taking Failures in Spoken Dialogue Systems , 2007, SIGdial.

[10]  Brendan J. Frey,et al.  Combination of statistical and rule-based approaches for spoken language understanding , 2002, INTERSPEECH.

[11]  Tetsuya Ogata,et al.  Introducing Utterance Verification in Spoken Dialogue System to Improve Dynamic Help Generation for Novice Users , 2007, SIGdial.

[12]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[13]  Frédéric Béchet,et al.  On the use of finite state transducers for semantic interpretation , 2006, Speech Commun..

[14]  Alessandro Moschitti,et al.  Re-Ranking Models for Spoken Language Understanding , 2009, EACL.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  Tetsuya Ogata,et al.  Automatic Allocation of Training Data for Rapid Prototyping of Speech Understanding based on Multiple Model Combination , 2010, COLING.

[17]  Hong-Kwang Jeff Kuo,et al.  Statistical recursive finite state machine parsing for speech understanding , 2000, INTERSPEECH.

[18]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Tetsuya Ogata,et al.  Rapid Prototyping of Robust Language Understanding Modules for Spoken Dialogue Systems , 2008, IJCNLP.

[20]  Tetsuya Ogata,et al.  Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models , 2009, INTERSPEECH.

[21]  Hermann Ney,et al.  System combination for spoken language understanding , 2008, INTERSPEECH.

[22]  Frédéric Béchet,et al.  Sequential Decision Strategies for Machine Interpretation of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.