论文信息 - Comparison of applying Pair HMMs and DBN models in Transliteration Identification

Comparison of applying Pair HMMs and DBN models in Transliteration Identification

Transliteration is aimed at dealing with unknown words in Cross Language Information Retrieval (CLIR) and Machine Translation (MT). Most of the transliteration tasks depend on a similarity estimation stage where a model is utilized with the aim of identifying a transliteration match for a given source word. In this paper, we evaluate the application of two related frameworks to transliteration identification. Both frameworks model string similarity as the cost incurred through a series of edit operations. One framework implements Pair Hidden Markov Models (Pair HMMs) (Mackay and Kondrak 2005) while the other implements classes of Dynamic Bayesian Network (DBN) models (Filali and Bilmes 2005). For each Pair HMM, we adapt different algorithms for computing transliteration similarity estimates. For the DBN framework, we modify the DBN classes in (Filali and Bilmes 2005) and specify models from the classes to represent factorizations that we hypothesize could affect the value of a transliteration similarity estimate. Separate tests applying models from the two frameworks result in high transliteration identification accuracy on an experimental setup of Russian-English transliteration. A check on the output from models associated with the two frameworks suggests that there can be improved transliteration identification accuracy through a combination of models.

Peter Nabende | Peter Nabende

[1] Grzegorz Kondrak,et al. Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models , 2005, CoNLL.

[2] Geoffrey Zweig,et al. Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[3] Jian Su,et al. A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[4] John Nerbonne,et al. Inducing Sound Segment Differences Using Pair Hidden Markov Models , 2007, SIGMORPHON.

[5] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[6] Kevin Knight,et al. Machine Transliteration , 1997, CL.

[7] Eunok Paek,et al. An English to Korean Transliteration Model of Extended Markov Window , 2000, COLING.

[8] Jörg Tiedemann,et al. Pair Hidden Markov Model for Named Entity Matching , 2008, SCSS.

[9] Karim Filali,et al. A Dynamic Bayesian Framework to Model Context and Memory in Edit Distance Learning: An Application to Pronunciation Classification , 2005, ACL.

[10] Grzegorz Kondrak,et al. Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification , 2006 .