论文信息 - Combinaison de systèmes par décodage guidé

Combinaison de systèmes par décodage guidé

In this paper, we propose an integrated approach for system combination named Driven Decoding Algorithm (DDA). It consists in guiding the search algorithm of a primary ASR system by the outputs of a auxiliary systems. We first evaluate this method in simple configuration in which the primary search is driven by the one-best hypothesis of a single auxiliary system. Then, we generalize DDA to confusion-network driven decoding and we propose a general combination schemes for multiple system combination. The proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method : we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

Georges Linarès | Benjamin Lecouteux | Guillaume Gravier | Yannick Estève

[1] Brian Kingsbury,et al. Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2] Georges Linarès,et al. Frame-based acoustic feature integration for speech understanding , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Paul Deléglise,et al. The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news , 2005, INTERSPEECH.

[4] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[5] Richard M. Schwartz,et al. The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[6] Hermann Ney,et al. Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[7] Richard M. Stern,et al. The 1997 CMU Sphinx-3 English Broadcast News Transcription System , 1997 .

[8] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[9] I-Fan Chen,et al. A new framework for system combination based on integrated hypothesis space , 2006, INTERSPEECH.

[10] Pascale Sébillot,et al. Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation , 2007, INTERSPEECH.

[11] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[12] Georges Linarès,et al. Principes et performances du décodeur parole continue Speeral , 2002 .