Name-aware speech recognition for interactive question answering

In this work we show how interactivity in a voice-enabled question answering application may improve speech recognition. We allow the user to provide a target named entity before asking the question. Then we build a named entity specific language model using the documents containing the named entity. The question-specific model is obtained by merging the named entity specific model with the model built on a set of questions. We present a set of experiments using the TREC question set on the AQUAINT corpus. The question-specific language model is compared with the baseline model built by merging a model of the AQUAINT corpus and past TREC questions. The question-specific model achieves 32.2% reduction in word error rate from the baseline using the questions where pronominal references are resolved.

[1]  Sarangarajan Parthasarathy,et al.  Experiments in keypad-aided spelling recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Dilek Z. Hakkani-Tür,et al.  Mining Spoken Dialogue Corpora for System Evaluation and Modelin , 2004, EMNLP.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Dilek Z. Hakkani-Tür,et al.  Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[6]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[7]  Xie Kanglin Lucene Search Engine , 2007 .

[8]  Andreas Stolcke,et al.  DynaSpeak: SRI's scalable speech recognizer for embedded and mobile systems , 2002 .

[9]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[10]  Hauke Schramm,et al.  Strategies for name recognition in automatic directory assistance systems , 2000, Speech Commun..

[11]  Steven Skiena,et al.  Lydia: A System for Large-Scale News Analysis , 2005, SPIRE.

[12]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[13]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[14]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[15]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..