Heteronym Verification for Mandarin Speech Synthesis

Accurate phonetic transcription of speech corpus is critical to high quality speech synthesis. In Mandarin text-to-speech (MTTS) system, one major problem of automatically labeling the database is the heteronym annotation. Because in Mandarin, there are some single-character words or multi-character words have more than one pronunciation. In this paper, a heteronym annotation verification method for MTTS database labeling is proposed. By training contextual dependent HMMs and calculating the log likelihood ratio, each heteronym in the database is assigned a confidence score and those below the threshold are selected for manual inspecting. We divide heteronyms in Mandarin into two categories and different features are used for each category. The result of our experiment on an artificial test set has shown that we can achieve EER (equal error rate) of 7.9% and 11.9% for these two categories. Further test on an actual database which contains a total of 36098 heteronyms has shown that the proposed method can find 89 of all 123 annotation errors by only inspecting 639 polyphones.