Goodness of tone (GOT) for non-native Mandarin tone recognition

Lexical tone is one of the most challenging pronunciation problems in tonal language acquisition. Accurate lexical tone production is especially challenging for people whose native language is not a tonal one. In this paper, we propose Goodness of Tone (GOT), a confidence measure inspired from goodness of pronunciation (GOP) for tone recognition. GOT is a vector representation of the confidence of each lexical tone of the given speech segment. The proposed GOT confidence measure is useful in tone recognition due to the following: 1) Unlike other tonal features such as pitch or fundamental frequency variation, GOT integrates both phonetic and tonal information. 2) GOT exploits competing tonal phones which differ only in tonal label but are the same in phonetic labels as a reference to conduct cohort normalization. 3) GOT is a vector that concatenates confidence scores from all the possible lexical tones, making it easier to characterize error patterns of non-native tonal production.

[1]  Rong Tong,et al.  A Target-Oriented Phonotactic Front-End for Spoken Language Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  James R. Glass,et al.  A Comparison-based Approach to Mispronunciation Detection by , 2012 .

[3]  Gina-Anne Levow,et al.  Modeling Broad Context for Tone Recognition with Conditional Random Fields , 2011, INTERSPEECH.

[4]  Qian Liu,et al.  A Pitch Smoothing Method for Mandarin Tone Recognition , 2013 .

[5]  Yih-Ru Wang,et al.  A statistics-based pitch contour model for Mandarin speech. , 2005, The Journal of the Acoustical Society of America.

[6]  Rong Tong,et al.  Speaker cluster based GMM tokenization for speaker recognition , 2006, INTERSPEECH.

[7]  Hao Wu,et al.  Exploiting prosodic and lexical features for tone modeling in a conditional random field framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Jens Edlund,et al.  A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm , 2010, LREC.

[9]  Hussein Hussein,et al.  Real-Time Tone Recognition in A Computer-Assisted Language Learning System for German Learners of Mandarin , 2012 .

[10]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[11]  Rong Tong,et al.  Subspace Gaussian mixture model for computer-assisted language learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Mark Liberman,et al.  Mandarin tone classification without pitch tracking , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Helmer Strik,et al.  The goodness of pronunciation algorithm: a detailed performance study , 2009, SLaTE.

[14]  L. Boves,et al.  Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. , 2000, The Journal of the Acoustical Society of America.

[15]  Bin Ma,et al.  Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages , 2013, INTERSPEECH.

[16]  Gina-Anne Levow,et al.  Can voice quality improve mandarin tone recognition? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Runsheng Liu,et al.  Lattice-based GOP in automatic pronunciation evaluation , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[19]  Jinxu Tao,et al.  Mandarin tone recognition considering context information , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[20]  Chin-Hui Lee,et al.  Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment , 2010, INTERSPEECH.

[21]  Jeff A. Bilmes,et al.  DBN-based multi-stream models for Mandarin toneme recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[22]  政子 鶴岡,et al.  1998 IEEE International Conference on SMCに参加して , 1998 .

[23]  Berna Arda The 2(nd) International Conference on Ethics Education. , 2014, Balkan medical journal.

[24]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Yonghong Yan,et al.  A SVM based Tone Recognition For Mandarin Multi-syllable Words , 2013 .

[26]  Rong Tong,et al.  iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent , 2015, INTERSPEECH.

[27]  Ke Yan,et al.  Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models , 2011 .

[28]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Rong Tong,et al.  Chinese Dialect Identification Using Tone Features Based on Pitch Flux , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Rong Tong,et al.  Tokenizing fundamental frequency variation for Mandarin tone error detection , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).