Phonetic transcription verification with generalized posterior probability

Accurate phonetic transcription is critical to high quality concatenation based text-to-speech synthesis. In this paper, we propose to use generalized syllable posterior probability (GSPP) as a statistical confidence measure to verify errors in phonetic transcriptions, such as reading errors, inadequate alternatives of pronunciations in the lexicon, letter-to-sound errors in transcribing out-of-vocabulary words, idiosyncratic pronunciations, etc. in a TTS speech database. GSPP is computed based upon a syllable graph generated by a recognition decoder. Testing on two data sets, the proposed GSPP is shown to be effective in locating phonetic transcription errors. Equal error rates (EERs) of 8.2% and 8.4%, are obtained on two testing sets, respectively. It is also found that the GSPP verification performance is fairly stable over a wide range around the optimal value of acoustic model exponential weight used in computing GSPP.