Language model switching based on topic detection for dialog speech recognition

An efficient, scalable speech recognition architecture is proposed for multidomain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the user's utterance, and speech recognition is then performed with an appropriate domain-dependent language model. The architecture improves accuracy and efficiency over current approaches and is scaleable to a large number of domains. In this paper, a novel framework using a multilayer hierarchy of language models is introduced in order to improve robustness against topic detection errors. The proposed system provides a relative reduction in WER of 10.5% over a single language model system. Furthermore it achieves an accuracy that is comparable to using multiple language models in parallel while using only a fraction of the computational cost.

[1]  Frank Wessel,et al.  Robust dialogue-state dependent language modeling using leaving-one-out , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[3]  Andrej Ljolje,et al.  A spoken language system for automated call routing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Joseph Polifroni,et al.  Organization, communication, and control in the GALAXY-II conversational system , 1999, EUROSPEECH.

[5]  Jun Wu,et al.  A maximum entropy language model integrating N-grams and topic dependencies for conversational speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).