Acoustic model selection using limited data for accent robust speech recognition

This paper investigates techniques to compensate for the effects of regional accents of British English on automatic speech recognition (ASR) performance. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation, or to use accent identification (AID) to identify the speaker's accent followed by accent-dependent ASR? Three approaches to accent-dependent modelling are investigated: using the `correct' accent model, choosing a model using supervised (ACCDIST-based) accent identification (AID), and building a model using data from neighbouring speakers in `AID space'. All of the methods outperform the accent-independent model, with relative reductions in ASR error rate of up to 44%. Using on average 43s of speech to identify an appropriate accent-dependent model outperforms using it for supervised speaker-adaptation, by 7%.

[1]  Olivier Siohan,et al.  Ivector-based Acoustic Data Selection , 2013, INTERSPEECH.

[2]  Satoshi Nakamura,et al.  Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models , 2004, INTERSPEECH.

[3]  Tao Chen,et al.  Accent Issues in Large Vocabulary Continuous Speech Recognition , 2004, Int. J. Speech Technol..

[4]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[5]  John C. Wells,et al.  Accents of English , 1982 .

[6]  Michiel Bacchiani,et al.  Rapid adaptation for mobile speech applications , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tanja Schultz,et al.  Polyphone decision tree specialization for language adaptation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Martin J. Russell,et al.  Human and computer recognition of regional accents and ethnic groups from British English speech , 2013, Comput. Speech Lang..

[9]  Mark Huckvale,et al.  Pronunciation variation modelling using accent features , 2005, INTERSPEECH.

[10]  Silke Goronzy,et al.  Robust Adaptation to Non-Native Accents in Automatic Speech Recognition , 2002, Lecture Notes in Computer Science.

[11]  Thomas Niesler,et al.  Multi-accent acoustic modelling of South African English , 2012, Speech Commun..

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[14]  Ngoc Thang Vu,et al.  Multilingual multilayer perceptron for rapid language adaptation between and across language families , 2013, INTERSPEECH.

[15]  Mark J. F. Gales Cluster adaptive training for speech recognition , 1998, ICSLP.

[16]  Yi Su,et al.  Accent detection and speech recognition for Shanghai-accented Mandarin , 2005, INTERSPEECH.

[17]  Tanja Schultz,et al.  Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition , 2012, INTERSPEECH.

[18]  Petr Motlícek,et al.  Accent adaptation using Subspace Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[20]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Bhiksha Raj,et al.  Continuous Feature Adaptation for Non-Native Speech Recognition , 2007 .

[23]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[24]  Mark Huckvale ACCDIST: An Accent Similarity Metric for Accent Recognition and Diagnosis , 2007, Speaker Classification.

[25]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[26]  Ramya Rasipuram,et al.  Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[27]  Martin J. Russell,et al.  Experiments with the ABI (accents of the british isles) speech corpus , 2008, INTERSPEECH.

[28]  S. Dupont,et al.  Feature extraction and acoustic modeling: an approach for improved generalization across languages and accents , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..