论文信息 - Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition

Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition

One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.

Ramya Rasipuram | Mathew Magimai-Doss | David Imseng

[1] Simon King,et al. Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian , 2008, INTERSPEECH.

[2] Jean Paul Haton,et al. Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints , 2006, INTERSPEECH.

[3] Hervé Bourlard,et al. Towards mixed language speech recognition systems , 2010, INTERSPEECH.

[4] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[5] Guillermo Aradilla. Acoustic Models for Posterior Features in Speech Recognition , 2008 .

[6] Irina Illina,et al. Multi-accent and accent-independent non-native speech recognition , 2008, INTERSPEECH.

[7] Hervé Bourlard,et al. Using KL-based acoustic models in a large vocabulary recognition task , 2008, INTERSPEECH.

[8] Fabio Valente. A Novel Criterion for Classifiers Combination in Multistream Speech Recognition , 2009, IEEE Signal Processing Letters.

[9] Hervé Bourlard,et al. Grapheme-Based Automatic Speech Recognition Using KL-HMM , 2011, INTERSPEECH.

[10] Hervé Bourlard,et al. Language dependent universal phoneme posterior estimation for mixed language speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Stefano Scanzio,et al. Experiments on hiwire database using denoising and adaptation with a hybrid HMM-ANN model , 2007, INTERSPEECH.