Cross-Validation EM Training for Robust Parameter Estimation

A new maximum likelihood training algorithm is proposed that compensates for weaknesses of the EM algorithm by using cross-validation likelihood in the expectation step to avoid overtraining. By using a set of sufficient statistics associated with a partitioning of the training data, as in parallel EM, the algorithm has the same order of computational requirements as the original EM algorithm. Analyses using a GMM with artificial data show the proposed algorithm is more robust for overtraining than the conventional EM algorithm. Large vocabulary recognition experiments on Mandarin broadcast news data show that the method makes better use of more parameters and gives lower recognition error rates than EM training.

[1]  Frédéric Bimbot,et al.  A comparative evaluation of variance flooring techniques in HMM-based speaker verification , 1998, ICSLP.

[2]  Steve Young,et al.  The HTK book , 1995 .

[3]  William J. Byrne,et al.  Convergence Theorems for Generalized Alternating Minimization Procedures , 2005, J. Mach. Learn. Res..

[4]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Takahiro Shinozaki Hmm State Clustering Based on Efficient Cross-Validation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Wen Wang,et al.  Investigation on Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[10]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Xavier Anguera Miró,et al.  Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.