Building multiple complementary systems using directed decision trees

Large vocabulary speech recognition systems typically use a combination of multiple systems to obtain the final hypothesis. For combination to give gains, the systems being combined must be complementary, i.e. they must make different errors. Often, complementary systems are chosen simply by training multiple systems, performing all combinations, and selecting the best. This approach becomes time consuming as more potential systems are considered, and hence recent work has looked at explicitly building systems to be complementary to each other. This paper considers building multiple complementary systems based on directed decision trees, and combining them within a multi-pass adaptive framework. The tree divergence is introduced for easy comparison of trees without having to build entire systems. Experiments are presented on a Broadcast News Arabic task, and show that gains can be achieved by using more than one complementary system.

[1]  Mark J. F. Gales,et al.  Generating Complementary Systems for Speech Recognition , 2022 .

[2]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[3]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[4]  Brian Kingsbury,et al.  Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Mark J. F. Gales,et al.  Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[7]  Jonathan G. Fiscus,et al.  REDUCED WORD ERROR RATES , 1997 .

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  Mark J. F. Gales,et al.  Complementary System Generation using Directed Decision Trees , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Rong Zhang,et al.  A frame level boosting training scheme for acoustic modeling , 2004, INTERSPEECH.