Urdu Hindi Machine Transliteration using SMT

Transliteration is a process of transcribing a word of the source language into the target language such that when the native speaker of the target language pronounces it, it sounds as the native pronunciation of the source word. Statistical techniques have brought significant advances and have made real progress in various fields of Natural Language Processing (NLP). In this paper, we have analysed the application of Statistical Machine Translation (SMT) for solving the problem of Urdu Hindi transliteration using a parallel lexicon. We have designed total 24 Statistical Transliteration (ST) systems by combining different types of alignments, translation models and target language models. We have performed total 576 experiments and have reported significant results. From Hindi–to–Urdu transliteration, we have achieved the maximum word-level accuracy of 71.5%. From Urdu–to–Hindi transliteration, the maximum word-level accuracy is 77.8% when the input Urdu text contains all necessary diacritical marks and 77% when the input Urdu text does not contain all necessary diacritical marks. At character-level, transliteration accuracy is more than 90% in both directions.

[1]  Leah S. Larkey,et al.  Statistical transliteration for english-arabic cross language information retrieval , 2003, CIKM '03.

[2]  Kevin Knight,et al.  Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[3]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[4]  John T. Platts,et al.  A dictionary of Urdū, classical Hindī, and English , 1961 .

[5]  Pushpak Bhattacharyya,et al.  Hindi Urdu Machine Transliteration using Finite-State Transducers , 2008, COLING.

[6]  Abdul Jamil Khan Urdu/Hindi: An Artificial Divide: African Heritage, Mesopotamian roots, Indian Culture & British Colonialism , 2006 .

[7]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[8]  Nadir Durrani,et al.  Hindi-to-Urdu Machine Translation through Transliteration , 2010, ACL.

[9]  Sarmad Hussain,et al.  Letter-to-Sound Conversion for Urdu Text-to-Speech System , 2004, COLING 2004.

[10]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11]  M. Scully,et al.  Abstract , 2003 .

[12]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[13]  John T. Platts,et al.  A Grammar of the Hindustani or Urdu Language , 1874 .

[14]  Fei Huang,et al.  Confidence Measure for Word Alignment , 2009, ACL.

[15]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[16]  Pushpak Bhattacharyya,et al.  A Hybrid Model for Urdu Hindi Transliteration , 2009, NEWS@IJCNLP.

[17]  Sivaji Bandyopadhyay,et al.  A Modified Joint Source-Channel Model for Transliteration , 2006, ACL.

[18]  NeyHermann,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004 .

[19]  Jian Su,et al.  A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[20]  Shankar Kumar,et al.  Improving Word Alignment with Bridge Languages , 2007, EMNLP.

[21]  Jae Sung Lee,et al.  English to Korean Statistical Transliteration for Information Retrieval , 2008 .

[22]  Peter Nabende,et al.  Transliteration System Using Pair HMM with Weighted FSTs , 2009, NEWS@IJCNLP.

[23]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[24]  Hermann Ney,et al.  A Comparative Study on Reordering Constraints in Statistical Machine Translation , 2003, ACL.

[25]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[26]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[27]  Yaser Al-Onaizan,et al.  Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.

[28]  Eiichiro Sumita,et al.  Transliteration by Bidirectional Statistical Machine Translation , 2009, NEWS@IJCNLP.

[29]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[30]  M. McCarthy The statistical approach , 1959 .

[31]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[32]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[33]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[34]  Shuly Wintner,et al.  Lightly Supervised Transliteration for Machine Translation , 2009, EACL.

[35]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.