论文信息 - Generalized driven decoding for speech recognition system combination

Generalized driven decoding for speech recognition system combination

Driven decoding algorithm (DDA) is initially an integrated approach for the combination of 2 speech recognition (ASR) systems. It consists in guiding the search algorithm of a primary ASR system by the one-best hypothesis of an auxiliary system. In this paper, we generalize DDA to confusion-network driven decoding and we propose new combination schemes for multiple system combination. Since previous experiments involved 2 ASR systems on broadcast news data, the proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized- DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

Georges Linarès | Benjamin Lecouteux | Guillaume Gravier | Yannick Estève

[1] Hermann Ney,et al. Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[2] I-Fan Chen,et al. A new framework for system combination based on integrated hypothesis space , 2006, INTERSPEECH.

[3] Richard M. Schwartz,et al. The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[4] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5] Richard M. Stern,et al. The 1997 CMU Sphinx-3 English Broadcast News Transcription System , 1997 .

[6] Brian Kingsbury,et al. Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7] Paul Deléglise,et al. The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news , 2005, INTERSPEECH.

[8] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[9] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[10] Pascale Sébillot,et al. Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation , 2007, INTERSPEECH.

[11] Georges Linarès,et al. System Combination by Driven Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.