论文信息 - Word Error Rate Minimization Using an Integrated Confidence Measure

Word Error Rate Minimization Using an Integrated Confidence Measure

This paper describes a new criterion for speech recognition using an integrated confidence measure to minimize the word error rate (WER). The conventional criteria for WER minimization obtain the expected WER of a sentence hypothesis merely by comparing it with other hypotheses in an n-best list. The proposed criterion estimates the expected WER by using an integrated confidence measure with word posterior probabilities for a given acoustic input. The integrated confidence measure, which is implemented as a classifier based on maximum entropy (ME) modeling or support vector machines (SVMs), is used to acquire probabilities reflecting whether the word hypotheses are correct. The classifier is comprised of a variety of confidence measures and can deal with a temporal sequence of them to attain a more reliable confidence. Our proposed criterion for minimizing WER achieved a WER of 9.8% and a 3.9% reduction, relative to conventional n-best rescoring methods in transcribing Japanese broadcast news in various environments such as under noisy field and spontaneous speech conditions.

[1] Andreas Stolcke,et al. Efficient lattice representation and generation , 1998, ICSLP.

[2] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[4] Kazuo Onoe,et al. Time dependent language model for broadcast news transcription and its post-correction , 1998, ICSLP.

[5] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[6] Mikio Nakano,et al. Using untranscribed user utterances for improving language models based on confidence scoring , 2003, INTERSPEECH.

[7] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[8] Stephen Cox,et al. Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9] Mitch Weintraub,et al. Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[10] Herbert Gish,et al. Evaluation of word confidence for speech recognition systems , 1999, Comput. Speech Lang..

[11] Bhiksha Raj,et al. A boosting approach for confidence scoring , 2001, INTERSPEECH.

[12] Hiroyuki Segi,et al. Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer(Special Issue on the 2001 IEICE Excellent Paper Award) , 2003 .

[13] Thomas Schaaf,et al. Estimating confidence using word lattices , 1997, EUROSPEECH.

[14] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[15] Alexander H. Waibel,et al. Recognition of conversational telephone speech using the JANUS speech engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16] Stanley F. Chen,et al. A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[17] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[18] Dilek Z. Hakkani-Tür,et al. Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[19] T. Imai. Speech recognition for subtitling Japanese live broadcasts , 2004 .

[20] Dilek Z. Hakkani-Tür,et al. Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[21] Hermann Ney,et al. Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[22] Vaibhava Goel,et al. LVCSR rescoring with modified loss functions: a decision theoretic perspective , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[23] Joseph Polifroni,et al. Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..