Margin-Based Discriminative Training for String Recognition

Typical training criteria for string recognition like for example minimum phone error (MPE) and maximum mutual information (MMI) in speech recognition are based on a (regularized) loss function. In contrast, large-margin classifiers-the de-facto standard in machine learning-maximize the separation margin. An additional loss term penalizes misclassified samples. This paper shows how typical training criteria like for example MPE or MMI can be extended to incorporate the margin concept, and that such modified training criteria are smooth approximations to support vector machines with the respective loss function. The proposed approach takes advantage of the generalization bounds of large-margin classifiers while keeping the efficient framework for conventional discriminative training. This allows us to directly evaluate the utility of the margin term for string recognition. Experimental results are presented using the proposed modified training criteria for different tasks from speech recognition (including large-vocabulary continuous speech recognition tasks trained on up to 1500-h audio data), part-of-speech tagging, and handwriting recognition.

[1]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  M Volker,et al.  ICDAR 2007 - Arabic Handwriting Recognition Competition , 2007 .

[4]  Georg Heigold,et al.  Development of the 2007 RWTH Mandarin LVCSR system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Dong Yu,et al.  Large-margin minimum classification error training: A theoretical risk minimization perspective , 2008, Comput. Speech Lang..

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[8]  Detlev Langmann,et al.  A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Georg Heigold,et al.  Optimizing CRFs for SLU tasks in various languages using modified training criteria , 2009, INTERSPEECH.

[10]  Yan Yin,et al.  A fast optimization method for large margin estimation of HMMs based on second order cone programming , 2007, INTERSPEECH.

[11]  Georg Heigold,et al.  Modified MPE/MMI in a transducer-based framework , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Georg Heigold,et al.  On the equivalence of Gaussian and log-linear HMMs , 2008, INTERSPEECH.

[15]  Jinyu Li,et al.  A study on soft margin estimation for LVCSR , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[16]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Georg Heigold,et al.  Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[18]  Ryszard Gubrynowicz,et al.  Design and Data Collection for Spoken Polish Dialogs Database , 2008, LREC.

[19]  Georg Heigold,et al.  The RWTH 2007 TC-STAR evaluation system for european English and Spanish , 2007, INTERSPEECH.

[20]  Atsushi Nakamura,et al.  Flexible discriminative training based on equal error group scores obtained from an error-indexed forward-backward algorithm , 2008, INTERSPEECH.

[21]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[24]  Hermann Ney,et al.  Discriminative training with tied covariance matrices , 2004, INTERSPEECH.

[25]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[26]  Frédéric Béchet,et al.  The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems , 2004, LREC.

[27]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Hui Jiang,et al.  Incorporating Training Errors for Large Margin HMMS Under Semi-Definite Programming Framework , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Shinji Watanabe,et al.  Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training , 2009, INTERSPEECH.

[30]  George Saon,et al.  Penalty function maximization for large margin HMM training , 2008, INTERSPEECH.

[31]  Georg Heigold,et al.  Investigations on convex optimization using log-linear HMMs for digit string recognition , 2009, INTERSPEECH.

[32]  Georg Heigold,et al.  Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[33]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[34]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.