Deriving conversation-based features from unlabeled speech for discriminative language modeling

The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence “flows” from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.

[1]  Izhak Shafran,et al.  Corrective Models for Speech Recognition of Inflected Languages , 2006, EMNLP.

[2]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[3]  Aren Jansen,et al.  Estimating document frequencies in a speech corpus , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Michael Collins,et al.  Trigger-Based Language Modeling using a Loss-Sensitive Perceptron Algorithm , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[6]  S. Khudanpur,et al.  Large-scale Discriminative n-gram Language Models for Statistical Machine Translation , 2008, AMTA.

[7]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[8]  Andreas Stolcke,et al.  Web resources for language modeling in conversational speech recognition , 2007, TSLP.

[9]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[10]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[11]  J. Bilmes,et al.  Scaling Up Machine Learning: Parallel Graph-Based Semi-Supervised Learning , 2011 .

[12]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Geoffrey Zweig,et al.  Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Ebru Arisoy,et al.  Discriminative Language Modeling With Linguistic and Statistically Derived Features , 2012, IEEE Transactions on Audio, Speech, and Language Processing.