System combination for out-of-vocabulary word detection

This paper presents a method to improve the out-of-vocabulary (OOV) word detection performance by combining multiple speech recognition systems' outputs. Three different fragment-word hybrid systems, the phone, subword, and graphone systems, were built for detecting OOV words. Then outputs from each individual system were combined using ROVER. Two combination metrics were explored in ROVER, voting by word frequency and voting by both word frequency and word confidence score. The experimental results show that the OOV word detection performance of the ROVER system with confidence scores is better than the ROVER system with only word frequency, as well as any of the individual hybrid systems.

[1]  Mark J. F. Gales,et al.  Speech Recognition System Combination for Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Bhuvana Ramabhadran,et al.  A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4]  Thomas Schaaf Detection of OOV words using generalized word models and a semantic class language model , 2001, INTERSPEECH.

[5]  Alexander I. Rudnicky,et al.  OOV Detection and Recovery Using Hybrid Models with Different Fragments , 2011, INTERSPEECH.

[6]  Hiromitsu Nishizaki,et al.  Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs , 2010, INTERSPEECH.

[7]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[8]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[9]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[10]  Dietrich Klakow,et al.  OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.