Speech Modulation Features for Robust Nonnative Speech Accent Detection

In this paper, we propose to use speech modulation features for robust nonnative accent detection. Modulation spectrum carries long term temporal information of speech and may discriminate accents of native and nonnative speakers. For each speech segment to be tested, we extract a 10 dimension feature vector from modulation spectrum and use it for model training and testing. The proposed modulation features are compared with other popular features such as pitch and formant on a nonnative French accent detection task. Results show that the modulation features produce good detection performance and are quite robust to channel distortions. In addition, when combine test scores of modulation features and pitch features, performance is further significantly reduced. The best equal error rate is 13.1% by fusing pitch and modulation-based systems.

[1]  Andreas Stolcke,et al.  Detecting nonnative speech using speaker recognition approaches , 2008, Odyssey.

[2]  Christian Boitet,et al.  Voice aided input for phrase selection using a low level ASR approach - application to French and Khmer phrasebooks , 2010, SLTU.

[3]  Laurent Besacier,et al.  Unsupervised acoustic model adaptation for multi-origin non native ASR , 2010, INTERSPEECH.

[4]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  John H. L. Hansen,et al.  Automatic voice onset time detection for unvoiced stops (/p/, /t/, /k/) with application to accent classification , 2010, Speech Commun..

[8]  Javier Ramírez,et al.  Cepstral domain segmental nonlinear feature transformations for robust speech recognition , 2004, IEEE Signal Processing Letters.

[9]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  T. Tan A French Non-Native Corpus for Automatic Speech Recognition , 2006 .

[11]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[12]  Irina Illina,et al.  Foreign accent identification based on prosodic parameters , 2008, INTERSPEECH.

[13]  D. Jamieson,et al.  Intonation in English, French and German: Perception and Production , 1987 .

[14]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[15]  Tomi Kinnunen Joint Acoustic-Modulation Frequency for Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[17]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.