论文信息 - N-best entropy based data selection for acoustic modeling

N-best entropy based data selection for acoustic modeling

This paper presents a strategy for efficiently selecting informative data from large corpora of untranscribed speech. Confidence-based selection methods (i.e., selecting utterances we are least confident about) have been a popular approach, though they only look at the top hypothesis when selecting utterances and tend to select outliers, therefore, not always improving overall recognition accuracy. Alternatively, we propose a method for selecting data looking at competing hypothesis by computing entropy of N-best hypothesis decoded by the baseline acoustic model. In addition we address the issue of outliers by calculating how representative a specific utterance is to all other unselected utterances via a tf-idf score. Experiments show that N-best entropy based selection (%relative 5.8 in 400-hour corpus) outperformed other conventional selection strategies; confidence based and lattice entropy based, and that tf-idf based representativeness improved the model further (%relative 6.2). A comparison with random selection is also presented. Finally model size impact is discussed.

[1] Xiaodong Lin,et al. Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2] Koichi Shinoda,et al. Speech modeling based on committee-based active learning , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Mark Craven,et al. An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[4] Dilek Z. Hakkani-Tür,et al. Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[5] Dilek Z. Hakkani-Tür,et al. Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Mark J. F. Gales,et al. Unsupervised training and directed manual transcription for LVCSR , 2010, Speech Commun..

[7] Miles Osborne,et al. A Two-Stage Method for Active Learning of Statistical Grammars , 2005, IJCAI.

[8] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[9] Tianshun Yao,et al. Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[10] Dong Yu,et al. Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .

[11] Hui Jiang,et al. Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[12] Rong Jin,et al. Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Sadaoki Furui,et al. Generalization problem in ASR acoustic model training and adaptation , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.