System Combination by Driven Decoding

The combination of automatic speech recognition (ASR) systems generally relies on a posteriori merge of system outputs or on a cross-adaptation. In this paper, we propose an integrated approach where the search of a primary system is driven by the outputs of a secondary one. This method allows to drive the primary system search by using the one-best hypotheses and the word posteriors gathered from the secondary system. Experiments are carried out within the experimental framework of the ESTER evaluation campaign (S. Galliano et al. 2005). Results show that the driven decoding algorithm significantly outperforms the two single ASR systems (-8% of relative WER, -1.7% absolute). Finally, we investigate the interactions between driven decoding and cross-adaptations. The best cross-adaptation strategy in combination with the driven decoding process brings to a final absolute gain of about 1.9% WER.