Comparison of applying Pair HMMs and DBN models in Transliteration Identification

Transliteration is aimed at dealing with unknown words in Cross Language Information Retrieval (CLIR) and Machine Translation (MT). Most of the transliteration tasks depend on a similarity estimation stage where a model is utilized with the aim of identifying a transliteration match for a given source word. In this paper, we evaluate the application of two related frameworks to transliteration identification. Both frameworks model string similarity as the cost incurred through a series of edit operations. One framework implements Pair Hidden Markov Models (Pair HMMs) (Mackay and Kondrak 2005) while the other implements classes of Dynamic Bayesian Network (DBN) models (Filali and Bilmes 2005). For each Pair HMM, we adapt different algorithms for computing transliteration similarity estimates. For the DBN framework, we modify the DBN classes in (Filali and Bilmes 2005) and specify models from the classes to represent factorizations that we hypothesize could affect the value of a transliteration similarity estimate. Separate tests applying models from the two frameworks result in high transliteration identification accuracy on an experimental setup of Russian-English transliteration. A check on the output from models associated with the two frameworks suggests that there can be improved transliteration identification accuracy through a combination of models.