Constructing High-Accuracy Letter-to-Phoneme Rules with Machine Learning

This chapter describes a machine learning approach to the problem of letter-to-sound conversion that builds upon and extends the pioneering Nettalk work of Sejnowski and Rosenberg. Among the many extensions to the NETtalk system were the following: a different learning algorithm, a wider input window, errorcorrecting output coding, a right-to-left scan of the word to be pronounced (with the results of each decision influencing subsequent decisions), and the addition of several useful input features. These changes yielded a system that performs much better than the original Nettalk. After training on 19,002 words, the system achieves 93.7% correct pronunciation of individual phonemes and 64.8% correct pronunciation of whole words (where the pronunciation must exactly match the dictionary pronunciation to be correct) on an unseen 1000-word test set. Based on the judgements of three human listeners in a blind assessment study, our system was estimated to have a serious error rate of 16.7% (on whole words) compared to 26.1% for the DECtalk 3.0 rule base.