A spoken term detection framework for recovering out-of-vocabulary words using the web

Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.

[1]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[2]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Mark Dredze,et al.  Contextual Information Improves OOV Detection in Speech , 2010, NAACL.

[4]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[5]  Bhuvana Ramabhadran,et al.  Balancing false alarms and hits in Spoken Term Detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Hynek Hermansky,et al.  Combination of strongly and weakly constrained recognizers for reliable detection of OOVS , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Georges Linarès,et al.  Using the World Wide Web for Learning New Words in Continuous Speech Recognition Tasks: Two Case Studies , 2009 .

[8]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[9]  Mei-Yuh Hwang,et al.  Web-data augmented language models for Mandarin conversational speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Bhuvana Ramabhadran,et al.  Towards using hybrid word and fragment units for vocabulary independent LVCSR systems , 2009, INTERSPEECH.

[11]  Bhuvana Ramabhadran,et al.  Query-by-example Spoken Term Detection For OOV terms , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[13]  Mathias Creutz,et al.  Web Augmentation of Language Models for Continuous Speech Recognition of SMS Text Messages , 2009, EACL.

[14]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Bhuvana Ramabhadran,et al.  A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.