Round-robin discrimination model for reranking ASR hypotheses

We propose a novel model training method for reranking problems. In our proposed approach, named the round-robin duel discrimination (R2D2), model training is done so that all pairs of samples can be distinguished from each other. The loss function of R2D2 for a log-linear model is concave. Therefore we can easily find the global optimum by using a simple parameter estimation method such a gradient descent method. We also describe the relationships between the global conditional log-linear model (GCLM) and R2D2. R2D2 can be recognized as an expansion of GCLM. We evaluate R2D2 on an error correction language model for speech recognition. Our experimental results using the corpus of spontaneous Japanese show that R2D2 provides an accurate model with a high generalization ability.

[1]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Atsushi Nakamura,et al.  An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition , 2007, INTERSPEECH.

[3]  Shigeru Katagiri,et al.  A unified view for discriminative objective functions based on negative exponential of difference measure between strings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[5]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Atsushi Nakamura,et al.  Generalized fast on-the-fly composition algorithm for WFST-based speech recognition , 2005, INTERSPEECH.

[7]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[8]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[9]  Shoei Sato,et al.  Discriminative rescoring based on minimization of word errors for transcribing broadcast news , 2008, INTERSPEECH.

[10]  Michael Collins,et al.  Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Brian Roark,et al.  Corrective language modeling for large vocabulary ASR with the perceptron algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[14]  Atsushi Nakamura,et al.  A comparative study on methods of Weighted language model training for reranking lvcsr N-best hypotheses , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Frank K. Soong,et al.  A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.