An Overview of Discriminative Training for Speech Recognition

This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored as well as practicalities associated with implementing discriminative training for large vocabulary recognition. Alternatives to the MMI objective function such as minimum word error (MWE) and minimum phone error (MPE) will be discussed. The application of discriminative techniques for adaptation will be described. Finally, possible future avenues of research will be given.

[1]  Chin-Hui Lee,et al.  Discriminative training of language models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  H. Ney,et al.  INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING , 2007 .

[4]  Mark J. F. Gales,et al.  CU-HTK April 2002 Switchboard System , 2002 .

[5]  A. Nadas,et al.  A generalization of the Baum algorithm to rational objective functions , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Thomas Hain,et al.  The 1998 HTK broadcast news transcription system: development and results , 1999 .

[7]  Yves Normandin Optimal splitting of HMM Gaussian mixture components with MMIE training , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[9]  Sadik Kapadia,et al.  Discriminative Training of Hidden Markov Models , 1998 .

[10]  P. Woodland,et al.  Discriminative linear transforms for speaker adaptation , 2001 .

[11]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[12]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Salvatore D. Morgera,et al.  An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[15]  Y.-L. Chow Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Alexander H. Waibel,et al.  On maximum mutual information speaker-adapted training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  B. Merialdo Phonetic recognition using hidden Markov models and maximum mutual information training , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[20]  Régis Cardin,et al.  MMIE training for large vocabulary continuous speech recognition , 1994, ICSLP.

[21]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[22]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[23]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.