Segmental minimum Bayes-risk ASR voting strategies

ROVER [1] and its successor voting procedures have been shown to be quite effective in reducing the recognition word error rate (WER). The success of these methods has been attributed to their minimum Bayes-risk (MBR) nature: they produce the hypothesis with the least expected word error. In this paper we develop a general procedure within the MBR framework, called segmental MBR recognition, that encompasses current voting techniques and allows further extensions that yield lower expected WER. It also allows incorporation of loss functions other than the WER. We present a derivation of voting procedure of N-best ROVER as an instance of segmental MBR recognition. We then present an extension, called e-ROVER, that alleviates some of the restrictions of N-best ROVER by better approximating the WER. e-ROVER is compared with N-best ROVER on multi-lingual acoustic modeling task and is shown to yield modest yet significant and easily obtained improvements.

[1]  Vaibhava Goel,et al.  Task dependent loss functions in speech recognition: a* search over recognition lattices , 1999, EUROSPEECH.

[2]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[3]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[4]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.