A Comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods

This paper describes and compares two unsupervised algorithms to automatically align Japanese grapheme and phoneme strings, identifying segment-level correspondences between them. The rst algorithm is inspired by the tf-idf model, including enhancements to handle phonological variation and determine frequency through analysis of \alignment potential". The second algorithm relies on the C4.5 classication system, and makes multiple passes over the alignment data until consistency of output is achieved. In evaluation, the rst algorithm proves to be greatly superior to the second, producing a word accuracy of 96.94%.