Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL

In this work, we analyze phonetic and prosodic pronunciation patterns from iCALL, a speech corpus designed to evaluate Mandarin mispronunciations by non-native speakers of European origin and to address the lack of large-scale, non-native corpora with comprehensive annotations for applications in CAPT (computer-assisted pronunciation training). iCALL consists of 90,841 utterances from 305 speakers with a total duration of 142 hours. The speakers are from diverse linguistic backgrounds (spanning Germanic, Romance, and Slavic native languages). The read utterances are phonetically balanced with phonetic, tonal, and fluency annotations. Our findings on iCALL reveal that lexical tone errors are over six times more prevalent than phonetic errors, French speakers are twice as likely to mispronounce Tone 2, 3, 4 when compared to English speakers, native Romance language speakers are more likely to make de-aspiration and aspiration mistakes, and fluency scores correlate inversely with tone and phone error rate.

[1]  C. Baker Foundations of Bilingual Education and Bilingualism , 1993 .

[2]  Bin Ma,et al.  Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages , 2013, INTERSPEECH.

[3]  Rong Tong,et al.  SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese , 2016, INTERSPEECH.

[4]  Kun Li,et al.  Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks , 2015, SLaTE.

[5]  Elmar Nöth,et al.  Non-native speech databases , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[6]  Isabel Trancoso,et al.  Recognition of non-native accents , 1997, EUROSPEECH.

[7]  Mark Hasegawa-Johnson,et al.  Acquiring Speech Transcriptions Using Mismatched Crowdsourcing , 2015, AAAI.

[8]  Catherine T Best,et al.  An examination of the different ways that non-native phones may be perceptually assimilated as uncategorized. , 2016, The Journal of the Acoustical Society of America.

[9]  Rong Tong,et al.  Subspace Gaussian mixture model for computer-assisted language learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Joseph P. Campbell,et al.  Characterizing Phonetic Transformations and Acoustic Differences Across English Dialects , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Ulrike Gut,et al.  The LeaP Corpus , 2014 .

[12]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[13]  San Duanmu,et al.  The Phonology of Standard Chinese , 2001 .

[14]  Wei Li,et al.  Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  James R. Glass,et al.  Personalized mispronunciation detection and diagnosis based on unsupervised error pattern discovery , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Scott Jarvis,et al.  Crosslinguistic Influence in Language and Cognition , 2007 .

[17]  Elmar Nöth,et al.  Islands of failure: employing word accent information for pronunciation quality assessment of English L2 learners , 2009, SLaTE.

[18]  Yuan-fu Liao,et al.  A preliminary study on corpus design for computer-assisted German and Mandarin language learning , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[19]  Donna M. Brinton,et al.  Teaching Pronunciation: A Reference for Teachers of English to Speakers of Other Languages , 1996 .

[20]  Ari Huhta,et al.  Common European Framework of Reference , 2012 .

[21]  Guowen Shang,et al.  Singapore Mandarin: Its Positioning, Internal Structure and Corpus Planning , 2012 .

[22]  Dominique Estival,et al.  Construction of a phonotactic dialect corpus using semiautomatic annotation , 2007, INTERSPEECH.

[23]  Rong Tong,et al.  Tokenizing fundamental frequency variation for Mandarin tone error detection , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  A. Jongman,et al.  Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training. , 2003, The Journal of the Acoustical Society of America.

[25]  Ronald F. Price Education in modern China , 1979 .

[26]  Eric Atwell,et al.  The ISLE Corpus of Non-Native Spoken English , 2000, LREC.

[27]  Yuen Yee Lo,et al.  Deriving salient learners’ mispronunciations from cross-language phonological comparisons , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[28]  Rong Tong,et al.  Goodness of tone (GOT) for non-native Mandarin tone recognition , 2015, INTERSPEECH.

[29]  Lin-Shan Lee,et al.  Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Nobuaki Minematsu,et al.  English Speech Database Read by Japanese Learners for CALL System Development , 2002, LREC.

[31]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[32]  David Crystal,et al.  A History of the English Language: English worldwide , 2006 .

[33]  M. S. Whitley Spanish/English contrasts : a course in Spanish linguistics , 1987 .

[34]  James R. Glass,et al.  Mispronunciation detection without nonnative training data , 2015, INTERSPEECH.

[35]  Rong Tong,et al.  Context Aware Mispronunciation Detection for Mandarin Pronunciation Training , 2016, Interspeech.

[36]  David Suendermann,et al.  Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment , 2013 .

[37]  Chung-yu Chen Chow,et al.  A fifth tone in the Mandarin spoken in Singapore , 1982 .

[38]  D. Crystal Two thousand million? , 2008, English Today.

[39]  Martin Krämer The Phonology of Italian , 2009 .

[40]  Rong Tong,et al.  iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent , 2015, INTERSPEECH.

[41]  Zhen Qin,et al.  Perception of Cantonese Tones by Mandarin, English and French Speakers , 2011, ICPhS.

[42]  Klaus Zechner,et al.  Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).