论文信息 - Improved language modeling for conversational applications using sentence quality

Improved language modeling for conversational applications using sentence quality

In this paper, we propose a new approach to build language models for conversationals system using a a corpus of text as a opposed to a live or a Wizard-of-Oz collection. Each sentence in the corpus is assigned a “quality” that reflects the developer's intuition for how likely that sentence is to be spoken by a real user to the live system. Language Models (LM) are built for each sentence quality and these are subsequently interpolated to produce the final model. We also have built a classifier that assigns sentence qualities to the data, and whose subsequent language models achive similar improvements in word and turn error rate.

Bhuvana Ramabhadran | Mark E. Epstein | Rajesh Balchandran

[1] Jianfeng Gao,et al. Training data optimization for language model adaptation , 2003, INTERSPEECH.

[2] Sanjeev Khudanpur,et al. Maximum entropy language modeling with non-local dependencies , 2003 .

[3] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4] Ronald Rosenfeld,et al. Using story topics for language model adaptation , 1997, EUROSPEECH.

[5] Alex Acero,et al. Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[6] Andreas Stolcke,et al. Web resources for language modeling in conversational speech recognition , 2007, TSLP.

[7] Wayne H. Ward,et al. A language model combining trigrams and stochastic context-free grammars , 1998, ICSLP.

[8] Ruhi Sarikaya,et al. Rapid language model development using external resources for new spoken dialog domains , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9] Steve Renals,et al. Document space models using latent semantic analysis , 1997, EUROSPEECH.

[10] Sanjeev Khudanpur,et al. Language model adaptation for automatic speech recognition and statistical machine translation , 2005 .

[11] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[12] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13] Xuedong Huang,et al. A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14] Xuedong Huang,et al. Improved topic-dependent language modeling using information retrieval techniques , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15] Vaibhava Goel,et al. Exploiting unlabeled data using multiple classifiers for improved natural language call-routing , 2005, INTERSPEECH.

[16] Dong Yu,et al. Maximum entropy based generic filter for language model adaptation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17] Stephanie Seneff,et al. Language model data filtering via user simulation and dialogue resynthesis , 2005, INTERSPEECH.

[18] Jing Huang,et al. Effective acoustic adaptation for a distant-talking interactive TV system , 2008, INTERSPEECH.

[19] Michael Picheny,et al. Using semantic analysis to improve speech recognition performance , 2005, Comput. Speech Lang..

[20] Ruhi Sarikaya,et al. Rapid bootstrapping of statistical spoken dialogue systems , 2008, Speech Commun..

[21] Anthony J. Robinson,et al. Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22] S. U U M M M M A A R R Y Y Y Y E E. Distant-talking Interfaces for Control of Interactive TV Publishable Executive Summary Year 2 , 2022 .