Combination of strongly and weakly constrained recognizers for reliable detection of OOVS

This paper addresses the detection of OOV segments in the output of a large vocabulary continuous speech recognition (LVCSR) system. First, standard confidence measures from frame-based word- and phone-posteriors are investigated. Substantial improvement is obtained when posteriors from two systems - strongly constrained (LVCSR) and weakly constrained (phone posterior estimator) are combined. We show that this approach is also suitable for detection of general recognition errors. All results are presented on WSJ task with reduced recognition vocabulary.

[1]  Hervé Bourlard,et al.  Improving posterior based confidence measures in hybrid HMM/ANN speech recognition systems , 1998, ICSLP.

[2]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[3]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[4]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[6]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[7]  Petr Fousek OPTIMIZING BOTTLE-NECK FEATURES FOR LVCSR FrantiGr´ , 2008 .

[8]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[9]  Alex Acero,et al.  Maximum Entropy Confidence Estimation for Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Hynek Hermansky,et al.  Detection of out-of-vocabulary words in posterior based ASR , 2007, INTERSPEECH.

[11]  Hermann Ney,et al.  Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Fernando Pereira,et al.  Efficient general lattice generation and rescoring , 1999, EUROSPEECH.