Investigations on ensemble based semi-supervised acoustic model training

Semi-supervised learning has been recognized as an effective way to improve acoustic model training in cases where sufficient transcribed data are not available. Different from most of existing approaches only using single acoustic model and focusing on how to refine it, this paper investigates the feasibility of using ensemble methods for semi-supervised acoustic modeling training. Two methods are investigated here, one is a generalized Boosting algorithm, a second one is based on data partitions. Both methods demonstrate substantial improvement over baseline. More than 15% relative reduction of word error rate was observed in our experiments using a large real-world meeting recognition dataset.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Rong Zhang,et al.  Apply n-best list re-ranking to acoustic model combinations of boosting training , 2004, INTERSPEECH.

[3]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[4]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[5]  Jean-Luc Gauvain,et al.  Unsupervised acoustic model training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[7]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Marcello Federico,et al.  Unsupervised Language and Acoustic Model Adaptation for Cross Domain Portability , 2001 .

[9]  Gerard G. L. Meyer,et al.  Automatic selection of transcribed training material , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[10]  Vaibhava Goel,et al.  Task adaptation of acoustic and language models based on large quantities of data , 2004, INTERSPEECH.

[11]  Christophe Ambroise,et al.  Semi-supervised MarginBoost , 2001, NIPS.

[12]  Alexander I. Rudnicky,et al.  Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture , 2004 .

[13]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[15]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[16]  Shivani Agarwal,et al.  An Experimental Study of EM-Based Algorithms for Semi-Supervised Learning in Audio Classification , 2003 .