Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup

We propose a framework for transliteration which uses (i) a word-origin detection engine (pre-processing) (ii) a CRF based transliteration engine and (iii) a re-ranking model based on lexicon-lookup (post-processing). The results obtained for English-Hindi and English-Kannada transliteration show that the preprocessing and post-processing modules improve the top-1 accuracy by 7.1%.

[1]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[2]  Key-Sun Choi,et al.  An English-Korean Transliteration Model Using Pronunciation and Contextual Rules , 2002, COLING.

[3]  Kevin Knight,et al.  Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[4]  Haizhou Li,et al.  Whitepaper of NEWS 2009 Machine Transliteration Shared Task , 2009, NEWS@IJCNLP.

[5]  Prasad Pingali,et al.  Statistical Transliteration for Cross Langauge Information Retrieval using HMM alignment and CRF , 2008, IJCNLP 2008.

[6]  Naoto Kato,et al.  Transliteration Considering Context Information based on the Maximum Entropy Method , 2003 .

[7]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[11]  Vasudeva Varma,et al.  Statistical Transliteration for Cross Language Information Retrieval using HMM alignment model and CRF , 2008, IJCNLP.

[12]  Key-Sun Choi,et al.  Automatic Transliteration and Back-transliteration by Decision Tree Learning , 2000, LREC.

[13]  Eunok Paek,et al.  An English to Korean Transliteration Model of Extended Markov Window , 2000, COLING.

[14]  Jae Sung Lee,et al.  English to Korean Statistical Transliteration for Information Retrieval , 2008 .

[15]  A. Kumaran,et al.  A generic framework for machine transliteration , 2007, SIGIR.