Inference of letter-phoneme correspondences by delimiting and dynamic time warping techniques

An algorithm for inferring correspondences between letters and phonemes from a large set of word spellings and their associated phonemic forms is described. The algorithm uses two techniques to infer correspondences: delimiting and dynamic time warping (DTW). The first technique delimits the part of the word spelling and pronunciation that cannot be aligned with the existing set of correspondences. The second technique derives correspondences from the delimited part of that word. The inferred correspondences are evaluated in terms of translation performance tested with unseen words, proper names and novel words. The translation performance is compared with those obtained using the manually driven correspondences as the benchmark. Nonparametric statistical tests are used to establish whether the performances of inferred correspondences are significantly different from the manually derived correspondences.<<ETX>>