Word identification method for Japanese text-to-speech conversion system

A new word identification method is proposed to determine text readings and their accentuation for Japanese text-to-speech conversion system. Since Japanese writing characteristics results in an enormous number of word candidates, a technique of reducing the number of identification trials is introduced employing a dynamic programming method. Moreover, plausibility evaluation functions are utilized in the identification algorithm to reflect Japanese writing characteristics : grammatical word transition tendency, phrase number minimization principle, longest matching advantage and unusual kana(Japanese syllabary) expression penalty. Applying this method to 299 Japanese sentences consisting of 6867 words, 95.2% of the words were correctly identified. Further precise analysis confirms that about two-thirds of the errors can be easily corrected by revising the word transition networks and by enriching the word dictionaries. The remaining errors however can be recovered only by fine semantic analysis. Through this experiment, the proposed word identification method is shown to be useful in a Japanese text-to-speech conversion system.