Standard Speaker Selection in Speech Synthesis for Mandarin Tone Learning

The teaching speech chosen to imitate plays a key role in learning Mandarin tone for L2 learners. It has been found that the synthesis teaching speech becomes more acceptable if it is alike the L2 learner’s own speech. Voice modification technology can be used to synthesize the teaching speech with both the standard speech of Chinese and the learner’s speech. At the same time different standard Chinese speakers will definitely affect the quality of the synthesis speech. The paper studies the selection method of the standard speech of Chinese in the teaching speech synthesis. The speakers’ features including MFCC, pitch, rhythm are compared and Gaussian Mixture Model is used to select the most appropriate Chinese speaker. The perceptual experimental results show that the modification with the Chinese speech which is similar to the learner’s speech in MFCC gets the best teaching speech both in phonetic and tonal quality.