Perceptual differentiation modeling explains phoneme mispronunciation by non-native speakers

One of the difficulties in second language (L2) learning is the weakness in discriminating between acoustic diversity within an L2 phoneme category and between different categories. In this paper, we describe a general method to quantitatively measure the perceptual difference between a group of native and individual nonnative speakers. Normally, this task includes subjective listening tests and/or a thorough linguistic study. We instead use a totally automated method based on a psycho-acoustic auditory model. For a certain phoneme class, we measure the similarity of the Euclidean space spanned by the power spectrum of a native speech signal and the Euclidean space spanned by the auditory model output. We do the same for a non-native speech signal. Comparing the two similarity measurements, we find problematic phonemes for a given speaker. To validate our method, we apply it to different groups of non-native speakers of various first language (L1) backgrounds. Our results are verified by the theoretical findings in literature obtained from linguistic studies.

[1]  Richard Heusdens,et al.  A new psychoacoustical masking model for audio coding applications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kåre Sjölander,et al.  An HMM-based system for automatic segmentation and alignment of speech , 2003 .

[3]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[4]  D. Pressnitzer,et al.  Perceptual Organization of Sound Begins in the Auditory Periphery , 2008, Current Biology.

[5]  Mitch Weintraub,et al.  Automatic scoring of pronunciation quality , 2000, Speech Commun..

[6]  Shrikanth S. Narayanan,et al.  Using Articulatory Representations to Detect Segmental Errors in Nonnative Pronunciation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[8]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[9]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[10]  ROBERT BANNERT PROBLEMS IN LEARNING SWEDISH PRONUNCIATION AND IN UNDERSTANDING FOREIGN ACCENT , 1984 .

[11]  W. Bastiaan Kleijn,et al.  Selecting static and dynamic features using an advanced auditory model for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Marcin Kuropatwinski,et al.  Auditory-model based robust feature selection for speech recognition. , 2010, The Journal of the Acoustical Society of America.

[13]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[14]  W. Bastiaan Kleijn,et al.  The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Anna Hjalmarsson,et al.  Embodied conversational agents in computer assisted language learning , 2009, Speech Commun..