Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

We investigate supervised, semi-supervised and unsupervised training of DLMs.We use supervised and unsupervised confusion models to generate artificial data.We propose three target output selection methods for unsupervised DLM training.Ranking perceptron performs better than structured perceptron in most cases.Significant gains in ASR accuracy are obtained with unmatched acoustic and text data. Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM.

[1]  Preethi Jyothi,et al.  Distributed discriminative language models for Google voice-search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[4]  Brian Roark,et al.  Semi-supervised discriminative language modeling for Turkish ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Lidia Mangu,et al.  Finding consensus in speech recognition , 2000 .

[6]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk ASR voting strategies , 2000, INTERSPEECH.

[9]  Brian Roark,et al.  Phrasal Cohort Based Unsupervised Discriminative Language Modeling , 2012, INTERSPEECH.

[10]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Akinori Ito,et al.  Round-Robin Duel Discriminative Language Models , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Ebru Arisoy,et al.  Discriminative Language Modeling With Linguistic and Statistically Derived Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Brian Roark,et al.  Joint discriminative language modeling and utterance classification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Brian Roark,et al.  Hallucinated n-best lists for discriminative language modeling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Ebru Arisoy,et al.  Minimum Bayes risk discriminative language models for Arabic speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  Ethem Alpaydin,et al.  Classification and Ranking Approaches to Discriminative Language Modeling for ASR , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Brian Roark,et al.  Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling , 2013, INTERSPEECH.

[18]  Murat Saraclar,et al.  Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling , 2012, INTERSPEECH.

[19]  Murat Saraclar,et al.  Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[20]  Sanjeev Khudanpur,et al.  Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets , 2010, COLING.

[21]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[22]  Murat Saraclar,et al.  Semi-Supervised Discriminative Language Modeling with Out-of-Domain Text Data , 2013, HLT-NAACL.

[23]  Brian Roark,et al.  Utterance classification with discriminative language modeling , 2006, Speech Commun..

[24]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[25]  Jonathan G. Fiscus,et al.  Tools for the analysis of benchmark speech recognition tests , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[27]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[28]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[29]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[30]  Atsushi Nakamura,et al.  Efficient training of discriminative language models by sample selection , 2012, Speech Commun..

[31]  Frank K. Soong,et al.  A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[32]  Bhuvana Ramabhadran,et al.  Acoustically discriminative language model training with pseudo-hypothesis , 2012, Speech Commun..

[33]  Eric Fosler-Lussier,et al.  Discriminative language modeling using simulated ASR errors , 2010, INTERSPEECH.

[34]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[35]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[36]  Panayiotis G. Georgiou,et al.  Automatic speech recognition system channel modeling , 2010, INTERSPEECH.

[37]  Murat Saraclar,et al.  Unsupervised training methods for discriminative language modeling , 2014, INTERSPEECH.

[38]  Ebru Arisoy,et al.  Discriminative n-gram language modeling for Turkish , 2008, INTERSPEECH.

[39]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[40]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[41]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[42]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .