论文信息 - Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training

Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training

Abstract Using the central observation that margin-based weightedclassiﬁcation error (modeled using Minimum Phone Error(MPE)) corresponds to the derivative with respect to the mar-gin term of margin-based hinge loss (modeled using MaximumMutual Information (MMI)), this article subsumes and extendsmargin-based MPE and MMI within a broader framework inwhich the objective function is an integral of MPE loss over arange of margin values. Applying the Fundamental Theorem ofCalculus,thisintegraliseasilyevaluatedusingﬁnitedifferencesof MMI functionals; lattice-based training using the new crite-rion can then be carried out using differences of MMI gradi-ents. Experimental results comparing the new framework withmargin-based MMI, MCE and MPE on the Corpus of Sponta-neous Japanese and the MIT OpenCourseWare/MIT-World cor-pus are presented. 1. Introduction The ﬁeld of discriminative training for speech recognition haswitnessed considerable activity in recent years. The appeal ofminimizingphoneorworderrorratherthanstringerrorhasmo-tivated a transition from well-known string-level methods suchas MMI and MCE [1][2] to error-weighted approaches, such asMPE [3][4]. More recently, there has been a surge in proposalsfor“largemargin”approachestohiddenMarkovmodel(HMM)design, such as the “large-margin HMM” [5], “soft margin es-timation” [6], and incrementally shifted MCE loss [7]. Sha andSaul [8] made the important proposal that a ﬁne-grained er-ror measure, such as the Hamming distance between candidaterecognition strings, be itself directly incorporated into the mar-gin term for HMM-based learning. It turns out that introducinga margin term that multiplies ﬁne-grained error can easily bebrought to MMI, MCE and MPE based HMM training as well,simply by adding margin-scaled local frame/phone/word errorto lattice arc log-likelihoods during Forward-Backward com-putation [9][10][11]. This approach links the original use ofmargin in the context of machine learning (e.g. Support VectorMachines (SVMs)) with margin in the context of “tried-and-tested” frameworks for large-scale discriminative training withwell-understood methods for HMM optimization on large-scaleASR tasks. Beneﬁts to performance for large-scale tasks havebeen reported for the use of margin in MMI and MPE, thoughit appears the relative gains are larger for MMI than for MPE[10][11].Aiming at leveraging the beneﬁts of margin use within thecontextofMPE-styleerror-weightedHMMtraining,thisarticlepresents a uniﬁcation of margin-based MMI and MPE trainingbased on a novel concept:

Shinji Watanabe | Atsushi Nakamura | Erik McDermott

[1] Hermann Ney,et al. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[2] Shigeru Katagiri,et al. A unified view for discriminative objective functions based on negative exponential of difference measure between strings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Hung-An Chang,et al. Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Dong Yu,et al. Large-margin minimum classification error training: A theoretical risk minimization perspective , 2008, Comput. Speech Lang..

[5] Hui Jiang,et al. Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Atsushi Nakamura,et al. String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task , 2007, INTERSPEECH.

[9] Georg Heigold,et al. Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[10] Jinyu Li,et al. A study on soft margin estimation for LVCSR , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[11] Dwi Sianto Mansjur,et al. Non-Uniform error criteria for automatic pattern and speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Lawrence K. Saul,et al. Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[14] George Saon,et al. Penalty function maximization for large margin HMM training , 2008, INTERSPEECH.