Complementary System Generation using Directed Decision Trees

Large vocabulary continuous speech recognition (LVCSR) systems often use a multi-pass decoding strategy with a combination of multiple systems in the final stage. To reduce the error rate, these models must be complementary, i.e. make different errors. Previously, complementary systems have been generated by independently training a number of models, explicitly performing all combinations and picking the best performance. This method becomes infeasible as the potential number of systems increases, and does not guarantee that any of the models will be complementary. This paper presents an algorithm for generating complementary systems by altering the decision tree generation. Confusions made by a baseline system are resolved by separating confusable states, which might previously have been clustered together using the standard decision tree algorithm. Experimental results presented on a broadcast news Mandarin task show gains when combining the baseline with a complementary directed decision tree system.

[1]  Carsten Meyer Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[3]  Jonathan G. Fiscus,et al.  REDUCED WORD ERROR RATES , 1997 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  John H. L. Hansen,et al.  Selective training for hidden Markov models with applications to speech classification , 1999, IEEE Trans. Speech Audio Process..

[6]  Brian Kingsbury,et al.  Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[8]  Rong Zhang,et al.  A frame level boosting training scheme for acoustic modeling , 2004, INTERSPEECH.

[9]  Mark J. F. Gales,et al.  The Cu-Htk Mandarin Broadcast News Transcription System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[11]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[12]  Mark J. F. Gales,et al.  Generating Complementary Systems for Speech Recognition , 2022 .

[13]  Mark J. F. Gales,et al.  Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.