Unsupervised training of subspace gaussian mixture models for conversational telephone speech recognition

This paper presents our preliminary works on exploring unsupervised training of subspace gaussian mixture models for under-resourced CTS recognition task. The subspace model yields better performance than conventional GMM model, particularly in small or middle-sized training set. As an effective way to save human efforts, unsupervised learning is often applied to automatically transcribe a large amount of speech archives. The additional auto-transcribed data may help to improve model accuracy. In this paper, experiments are carried out on two publicly available English conversational telephone speech corpora. Both GMM and SGMM model in combination with unsupervised learning are examined and compared in this paper.

[1]  Kai Feng,et al.  SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION , 2009 .

[2]  George Zavaliagkos,et al.  Utilizing untranscribed training data to improve perfomance , 1998, LREC.

[3]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[4]  Richard M. Schwartz,et al.  Unsupervised Training on Large Amounts of Broadcast News Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Jean-Luc Gauvain,et al.  Lattice-based unsupervised acoustic model training , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hermann Ney,et al.  An improved method for unsupervised training of LVCSR systems , 2007, INTERSPEECH.

[8]  Mark J. F. Gales,et al.  Unsupervised Training for Mandarin Broadcast News and Conversation Transcription , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.