On maximum mutual information speaker-adapted training

In this work, we combine maximum mutual information-based parameter estimation with speaker-adapted training (SAT). As will be shown, this can be achieved by performing unsupervised parameter estimation on the test data, a distinct advantage for many recognition tasks involving conversational speech. We also propose an approximation to the maximum likelihood and maximum mutual information SAT re-estimation formulae that greatly reduces the amount of disk space required to conduct training on corpora such as Broadcast News, which contains speech from thousands of speakers. We present the results of a set of speech recognition experiments on three test sets: the English Spontaneous Scheduling Task corpus, Broadcast News, and a new corpus of Meeting Room data collected at the Interactive Systems Laboratories of the Carnegie Mellon University.

[1]  William J. Byrne,et al.  Discriminative speaker adaptation with conditional maximum likelihood linear regression , 2001, INTERSPEECH.

[2]  Ralf Schlüter,et al.  Investigations on discriminative training criteria , 2000 .

[3]  Hagen Soltau,et al.  Advances in meeting recognition , 2001, HLT.

[4]  William J. Byrne,et al.  Speaker adaptation with all-pass transforms , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[7]  Richard M. Schwartz,et al.  Practical Implementations of Speaker-Adaptive Training , 1997 .

[8]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[9]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[10]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Alexander H. Waibel,et al.  Performance comparisons of all-pass transform adaptation with maximum likelihood linear regression , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[13]  Philip C. Woodland,et al.  Improvements in linear transform based speaker adaptation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Alexander H. Waibel,et al.  Maximum mutual information speaker adapted training with semi-tied covariance matrices , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[16]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[17]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[18]  Andreas Stolcke,et al.  Improved maximum mutual information estimation training of continuous density HMMs , 2001, INTERSPEECH.

[19]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[20]  Fernando Pereira,et al.  Efficient general lattice generation and rescoring , 1999, EUROSPEECH.

[21]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[22]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[23]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[24]  Mehryar Mohri,et al.  Network optimizations for large-vocabulary speech recognition , 1999, Speech Commun..

[25]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.