Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition

In this paper, accent issues are discussed for Shanghai-accented Mandarin speech recognition. The phonetic confusion is analyzed in detail based on the alignment between the surface form and the baseform transcriptions. Mutual information is used as the measure to extract the most confusing phoneme pairs. It was found that each phoneme in one pair can be easily misrecognized with the other. To remove the phonetic confusion, it is better to replace the two phonemes in one pair with a newly generated one. Consequentially new phone sets are derived. The phonetic confusion analysis and the experimental evaluation are performed on a Shanghai-accented Mandarin speech corpus. Experimental results show that compared to the canonical phone set, the generated one can reduce the substitution error greatly and achieve a 0.72% absolute Chinese character error rate (CER) reduction. When it is combined with pronunciation modeling, the absolute CER reduction is 1.58%.

[1]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[2]  Chao Huang,et al.  Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition , 2000, INTERSPEECH.

[3]  Chung-Hsien Wu,et al.  Generation of robust phonetic set and decision tree for Mandarin using chi-square testing , 2002, Speech Commun..

[4]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Guo-Hong Ding,et al.  A decoder for large vocabulary continuous short message dictation on embedded devices , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Pascale Fung,et al.  Automatic phone set extension with confidence measure for spontaneous speech , 2003, INTERSPEECH.

[7]  Yi Liu,et al.  State-dependent mixture tying with variable codebook size for accented speech recognition , 2007, ASRU.

[8]  Bo Xu,et al.  Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  M.-Y. Tsai,et al.  Pronunciation Modeling With Reduced Confusion for Mandarin Chinese Using a Three-Stage Framework , 2007, IEEE Transactions on Audio, Speech, and Language Processing.