Combinaison de systèmes par décodage guidé

In this paper, we propose an integrated approach for system combination named Driven Decoding Algorithm (DDA). It consists in guiding the search algorithm of a primary ASR system by the outputs of a auxiliary systems. We first evaluate this method in simple configuration in which the primary search is driven by the one-best hypothesis of a single auxiliary system. Then, we generalize DDA to confusion-network driven decoding and we propose a general combination schemes for multiple system combination. The proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method : we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

[1]  Brian Kingsbury,et al.  Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Georges Linarès,et al.  Frame-based acoustic feature integration for speech understanding , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Paul Deléglise,et al.  The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news , 2005, INTERSPEECH.

[4]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[5]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[6]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[7]  Richard M. Stern,et al.  The 1997 CMU Sphinx-3 English Broadcast News Transcription System , 1997 .

[8]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[9]  I-Fan Chen,et al.  A new framework for system combination based on integrated hypothesis space , 2006, INTERSPEECH.

[10]  Pascale Sébillot,et al.  Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation , 2007, INTERSPEECH.

[11]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[12]  Georges Linarès,et al.  Principes et performances du décodeur parole continue Speeral , 2002 .