Discriminative training based on an integrated view of MPE and MMI in margin and error space

Recent work has demonstrated that the Maximum Mutual Information (MMI) objective function is mathematically equivalent to a simple integral of recognition error, if the latter is expressed as a margin-based Minimum Phone Error (MPE) style error-weighted objective function. This led to the proposal of a general approach to discriminative training based on integrals of MPE-style loss, calculated using “differenced MMI” (dMMI), a finite difference of MMI functionals evaluated at the edges of a margin interval. This article aims to clarify the essence and practical consequences of the new framework. The recently proposed Error-Indexed Forward-Backward Algorithm is used to visualize the close agreement between dMMI and MPE statistics for narrow margin intervals, and to illustrate the flexible control of the weight that can be given to different error levels using broader intervals. New speech recognition results are presented for the MIT OpenCourseWare/MIT-World corpus, showing small performance gains for dMMI compared to MPE for some choices of margin interval. Evaluation with an expanded 44K word trigram language model confirms that dMMI with a narrow margin interval yields the same performance as MPE.

[1]  Georg Heigold,et al.  Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[2]  Shigeru Katagiri,et al.  A unified view for discriminative objective functions based on negative exponential of difference measure between strings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Hung-An Chang,et al.  Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[5]  Jinyu Li,et al.  A study on soft margin estimation for LVCSR , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[6]  Shinji Watanabe,et al.  Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training , 2009, INTERSPEECH.

[7]  George Saon,et al.  Penalty function maximization for large margin HMM training , 2008, INTERSPEECH.

[8]  Atsushi Nakamura,et al.  Flexible discriminative training based on equal error group scores obtained from an error-indexed forward-backward algorithm , 2008, INTERSPEECH.

[9]  Tony Jebara,et al.  Machine learning: Discriminative and generative , 2006 .

[10]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Andreas Stolcke,et al.  Improved discriminative training using phone lattices , 2005, INTERSPEECH.

[12]  Hui Jiang,et al.  Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Dong Yu,et al.  Large-margin minimum classification error training: A theoretical risk minimization perspective , 2008, Comput. Speech Lang..