Training of error-corrective model for ASR without using audio data

This paper introduces a method to train an error-corrective model for Automatic Speech Recognition (ASR) without using audio data. In existing techniques, it is assumed that sufficient audio data of the target application is available and negative samples can be prepared by having ASR recognize this audio data. However, this assumption is not always true. We propose generating probable N-best lists, which the ASR may produce, directly from the text data of the target application by taking phoneme similarity into consideration. We call this process “Pseudo-ASR”. We conduct discriminative reranking with the error-corrective model by regarding the text data as positive samples and the N-best lists from the Pseudo-ASR as negative samples. Experiments with Japanese call center data showed that discriminative reranking based on the Pseudo-ASR improved the accuracy of the ASR.

[1]  Shoei Sato,et al.  Discriminative rescoring based on minimization of word errors for transcribing broadcast news , 2008, INTERSPEECH.

[2]  Johan Schalkwyk,et al.  Deploying GOOG-411: Early lessons in data, measurement, and testing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4]  MohriMehryar,et al.  Weighted finite-state transducers in speech recognition , 2002 .

[5]  Eric Fosler-Lussier,et al.  Discriminative language modeling using simulated ASR errors , 2010, INTERSPEECH.

[6]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[7]  Geoffrey Zweig,et al.  Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Masafumi Nishimura,et al.  Acoustically discriminative training for language models , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Daisuke Okanohara Jun A Discriminative Language Model with Pseudo-Negative Samples , 2007 .

[10]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[11]  Atsushi Nakamura,et al.  An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition , 2007, INTERSPEECH.

[12]  Ben Sandbank Refining Generative Language Models using Discriminative Learning , 2008, EMNLP.

[13]  Xiao Li,et al.  Discriminative training methods for language models using conditional entropy criteria , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Gary Geunbae Lee,et al.  An error-corrective language-model adaptation for automatic speech recognition , 2005, INTERSPEECH.

[15]  Panayiotis G. Georgiou,et al.  Automatic speech recognition system channel modeling , 2010, INTERSPEECH.

[16]  Frank K. Soong,et al.  A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[18]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[19]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[20]  Chin-Hui Lee,et al.  Discriminative training of language models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.