Multi-layered features with SVM for Chinese accent identification

In this paper, we propose an approach of multi-layered feature combination associated with support vector machine (SVM) for Chinese accent identification. The multi-layered features include both segmental and suprasegmental information, such as MFCC and pitch contour, to capture the diversity of variations in Chinese accented speech. The pitch contour is estimated using cubic polynomial method to model the variant characters in different accents in Chinese. We train two GMM acoustic models in order to express the features of a certain accent. As the original criterion of the GMM model cannot deal with such multi-layered features, the SVM is utilized to make the decision. The effectiveness of the proposed approach was evaluated on the 863 Chinese accent corpus. Our approach yields a significant 10% relative error rate reduction compared with traditional approaches using sole feature at single level in Chinese accented speech identification.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Venu Govindaraju,et al.  Accent classification in speech , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[3]  Rong Tong,et al.  Chinese Dialect Identification Using Tone Features Based on Pitch Flux , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Joachim Diederich,et al.  Accent Classification Using Support Vector Machines , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[5]  Chao Huang,et al.  Automatic accent identification using Gaussian mixture models , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  Pascale Fung,et al.  Partial change accent models for accented Mandarin speech recognition , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Hsiao-Chuan Wang,et al.  Language identification using pitch contour information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..