Hallucinated n-best lists for discriminative language modeling

This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.

[1]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2]  Michael Collins,et al.  Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Ebru Arisoy,et al.  Discriminative Language Modeling With Linguistic and Statistically Derived Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Andreas Stolcke,et al.  Web resources for language modeling in conversational speech recognition , 2007, TSLP.

[5]  Panayiotis G. Georgiou,et al.  Automatic speech recognition system channel modeling , 2010, INTERSPEECH.

[6]  Sanjeev Khudanpur,et al.  Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets , 2010, COLING.

[7]  Brian Roark,et al.  Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation , 2011, EMNLP.

[8]  Izhak Shafran,et al.  Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Masafumi Nishimura,et al.  Acoustically discriminative training for language models , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  John R. Hershey,et al.  Word confusability - measuring hidden Markov model similarity , 2007, INTERSPEECH.

[11]  Izhak Shafran,et al.  Corrective Models for Speech Recognition of Inflected Languages , 2006, EMNLP.

[12]  Peder A. Olsen,et al.  Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[13]  Masafumi Nishimura,et al.  Training of error-corrective model for ASR without using audio data , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[15]  Brian Kingsbury,et al.  The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[16]  Sanjeev Khudanpur,et al.  WEB-derived pronunciations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[18]  Eric Fosler-Lussier,et al.  Discriminative language modeling using simulated ASR errors , 2010, INTERSPEECH.

[19]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.