A Decade of Discriminative Language Modeling for Automatic Speech Recognition

This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.

[1]  Bhuvana Ramabhadran,et al.  Acoustically discriminative language model training with pseudo-hypothesis , 2012, Speech Commun..

[2]  Eric Fosler-Lussier,et al.  Discriminative language modeling using simulated ASR errors , 2010, INTERSPEECH.

[3]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[4]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[5]  Frank K. Soong,et al.  A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Panayiotis G. Georgiou,et al.  Automatic speech recognition system channel modeling , 2010, INTERSPEECH.

[7]  Brian Roark,et al.  Hallucinated n-best lists for discriminative language modeling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Ebru Arisoy,et al.  Minimum Bayes risk discriminative language models for Arabic speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[10]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[11]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[12]  Ebru Arisoy,et al.  Discriminative Language Modeling With Linguistic and Statistically Derived Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Preethi Jyothi,et al.  Distributed discriminative language models for Google voice-search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Dale Schuurmans,et al.  Improved Natural Language Learning via Variance-Regularization Support Vector Machines , 2010, CoNLL.

[15]  Sanjeev Khudanpur,et al.  Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets , 2010, COLING.

[16]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[17]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[18]  Brian Roark,et al.  Semi-supervised discriminative language modeling for Turkish ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[20]  S. Khudanpur,et al.  Large-scale Discriminative n-gram Language Models for Statistical Machine Translation , 2008, AMTA.

[21]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk ASR voting strategies , 2000, INTERSPEECH.

[22]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[23]  Ethem Alpaydin,et al.  Classification and Ranking Approaches to Discriminative Language Modeling for ASR , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jun Wu,et al.  Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling , 2000, Comput. Speech Lang..

[26]  Murat Saraclar,et al.  Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling , 2012, INTERSPEECH.

[27]  Izhak Shafran,et al.  Corrective Models for Speech Recognition of Inflected Languages , 2006, EMNLP.

[28]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[29]  Ruhi Sarikaya,et al.  Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Ebru Arisoy,et al.  Feature Combination Approaches for Discriminative Language Models , 2011, INTERSPEECH.

[31]  Chris Quirk,et al.  Discriminative, Syntactic Language Modeling through Latent SVMs , 2008 .

[32]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[33]  Izhak Shafran,et al.  Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[35]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[36]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[38]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[39]  Brian Roark,et al.  Joint discriminative language modeling and utterance classification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[40]  Michael Collins,et al.  Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[41]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..