Tokenizing fundamental frequency variation for Mandarin tone error detection

Tone error is commonly observed in tonal language acquisition. Correct tone production is especially challenging for native speakers of non-tonal languages. In this paper, we exploit the fundamental frequency variation (FFV) feature for Mandarin tone error detection. We propose to use FFV through two approaches: (1) Concatenating FFVs along side with standard speech recognition features; (2) Token FFV: Characterizing pitch variation with longer temporal context through GMM tokenization and n-gram language modeling. Our results show that tone error detection improves by incorporating FFV features and the two approaches are complementary to each other.

[1]  Jiang Wu,et al.  Tone recognition for continuous accented Mandarin Chinese , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Yih-Ru Wang,et al.  A statistics-based pitch contour model for Mandarin speech. , 2005, The Journal of the Acoustical Society of America.

[3]  Rong Tong,et al.  Speaker cluster based GMM tokenization for speaker recognition , 2006, INTERSPEECH.

[4]  Bei Yang,et al.  A model of Mandarin tone categories--a study of perception and production , 2010 .

[5]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[6]  Bin Ma,et al.  Strategies for Vietnamese keyword search , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Kornel Laskowski,et al.  Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  James R. Glass,et al.  A Comparison-based Approach to Mispronunciation Detection by , 2012 .

[9]  Mark Liberman,et al.  Mandarin tone classification without pitch tracking , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Jinxu Tao,et al.  Mandarin tone recognition considering context information , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[11]  Ke Yan,et al.  Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models , 2011 .

[12]  Jeff A. Bilmes,et al.  DBN-based multi-stream models for Mandarin toneme recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Jens Edlund,et al.  A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm , 2010, LREC.

[14]  Rong Tong,et al.  Subspace Gaussian mixture model for computer-assisted language learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  P. Keating,et al.  Comparison of speaking fundamental frequency in English and Mandarin. , 2010, The Journal of the Acoustical Society of America.

[16]  Liang Tao,et al.  Tone Production in Mandarin Chinese By American Students: A Case Study , 2008 .

[17]  Qian Liu,et al.  A Pitch Smoothing Method for Mandarin Tone Recognition , 2013 .

[18]  Rong Tong,et al.  Chinese Dialect Identification Using Tone Features Based on Pitch Flux , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  L. Boves,et al.  Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. , 2000, The Journal of the Acoustical Society of America.

[21]  Hao Wu,et al.  Exploiting prosodic and lexical features for tone modeling in a conditional random field framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Rong Tong,et al.  A Target-Oriented Phonotactic Front-End for Spoken Language Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Florian Metze,et al.  Models of tone for tonal and non-tonal languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[24]  Bin Ma,et al.  Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages , 2013, INTERSPEECH.

[25]  Gina-Anne Levow,et al.  Can voice quality improve mandarin tone recognition? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Sai Ji,et al.  Tone Recognition of Continuous Mandarin Speech Based on Binary-Class SVMs , 2009, 2009 First International Conference on Information Science and Engineering.

[27]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[28]  Hussein Hussein,et al.  Real-Time Tone Recognition in A Computer-Assisted Language Learning System for German Learners of Mandarin , 2012 .