Lattice-based unsupervised acoustic model training

Unsupervised acoustic model training has been successfully used to improve the performance of automatic speech recognition systems when only a small amount of manually transcribed data is available for the target domain. The most common approach is use automatic transcriptions to guide acoustic model estimation. However, since the best recognition hypotheses are known to contain errors, we propose to consider multiple transcription hypotheses during training. The idea is that the EM process can benefit from the estimated posterior probabilities of the hypotheses to converge to a better solution. The proposed unsupervised training method is based on lattices. Lattice-based training gives a relative improvement of 2.2% over 1-best training on a Broadcast News transcription task and converges faster with the iterative incremental training.

[1]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2]  Mark J. F. Gales,et al.  Unsupervised Training for Mandarin Broadcast News and Conversation Transcription , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Hermann Ney,et al.  An improved method for unsupervised training of LVCSR systems , 2007, INTERSPEECH.

[4]  Richard M. Schwartz,et al.  Unsupervised Training on Large Amounts of Broadcast News Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Geoffrey Zweig,et al.  LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .

[6]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Lori Lamel,et al.  Development of a speech-to-text transcription system for Finnish , 2010, SLTU.

[8]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[9]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[10]  George Zavaliagkos,et al.  Utilizing untranscribed training data to improve perfomance , 1998, LREC.