论文信息 - Statistical Method for English to Kannada Transliteration

Statistical Method for English to Kannada Transliteration

Language transliteration is one of the important area in natural language processing. Machine Transliteration is the conversion of a character or word from one language to another without losing its phonological characteristics. It is an orthographical and phonetic converting process. Therefore, both grapheme and phoneme information should be considered. Accurate transliteration of named entities plays an important role in the performance of machine translation and cross-language information retrieval processes. The transliteration model must be design in such a way that the phonetic structure of words should be preserve as closely as possible. This paper address the problem of transliterating English to Kannada language using a publically available translation tool called Statistical Machine Translation (SMT).This transliteration technique was demonstrated for English to Kannada Transliteration and achieved exact Kannada transliterations for 89.27% of English names. The result of proposed model is compared with the SVM based transliteration system as well as Google Indic transliteration system.

[1] Mehryar Mohri,et al. A Machine Learning Framework for Spoken-Dialog Classification , 2008 .

[2] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[3] Prasad Pingali,et al. Statistical Transliteration for Cross Langauge Information Retrieval using HMM alignment and CRF , 2008, IJCNLP 2008.

[4] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[5] Vladimir Vapnik,et al. Statistical learning theory , 1998 .