Word Error Rate Minimization Using an Integrated Confidence Measure

This paper describes a new criterion for speech recognition using an integrated confidence measure to minimize the word error rate (WER). The conventional criteria for WER minimization obtain the expected WER of a sentence hypothesis merely by comparing it with other hypotheses in an n-best list. The proposed criterion estimates the expected WER by using an integrated confidence measure with word posterior probabilities for a given acoustic input. The integrated confidence measure, which is implemented as a classifier based on maximum entropy (ME) modeling or support vector machines (SVMs), is used to acquire probabilities reflecting whether the word hypotheses are correct. The classifier is comprised of a variety of confidence measures and can deal with a temporal sequence of them to attain a more reliable confidence. Our proposed criterion for minimizing WER achieved a WER of 9.8% and a 3.9% reduction, relative to conventional n-best rescoring methods in transcribing Japanese broadcast news in various environments such as under noisy field and spontaneous speech conditions.

[1]  Andreas Stolcke,et al.  Efficient lattice representation and generation , 1998, ICSLP.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[4]  Kazuo Onoe,et al.  Time dependent language model for broadcast news transcription and its post-correction , 1998, ICSLP.

[5]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[6]  Mikio Nakano,et al.  Using untranscribed user utterances for improving language models based on confidence scoring , 2003, INTERSPEECH.

[7]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[8]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[10]  Herbert Gish,et al.  Evaluation of word confidence for speech recognition systems , 1999, Comput. Speech Lang..

[11]  Bhiksha Raj,et al.  A boosting approach for confidence scoring , 2001, INTERSPEECH.

[12]  Hiroyuki Segi,et al.  Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer(Special Issue on the 2001 IEICE Excellent Paper Award) , 2003 .

[13]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[14]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[15]  Alexander H. Waibel,et al.  Recognition of conversational telephone speech using the JANUS speech engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[17]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[18]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[19]  T. Imai Speech recognition for subtitling Japanese live broadcasts , 2004 .

[20]  Dilek Z. Hakkani-Tür,et al.  Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[21]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[22]  Vaibhava Goel,et al.  LVCSR rescoring with modified loss functions: a decision theoretic perspective , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[23]  Joseph Polifroni,et al.  Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..