Transliteration Systems across Indian Languages Using Parallel Corpora

Hindi is the lingua-franca of India. Although all non-native speakers can communicate well in Hindi, there are only a few who can read and write in it. In this work, we aim to bridge this gap by building transliteration systems that could transliterate Hindi into at-least 7 other Indian languages. The transliteration systems are developed as a reading aid for non-Hindi readers. The systems are trained on the transliteration pairs extracted automatically from a parallel corpora. All the transliteration systems perform satisfactorily for a non-Hindi reader to understand a Hindi text.

[1]  Haizhou Li,et al.  Transliteration Alignment , 2009, ACL.

[2]  Jörg Tiedemann Extraction of Translation Equivalents from Parallel Corpora , 1998, NODALIDA.

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Colin P. Masica The Indo-Aryan Languages , 1991 .

[6]  Pushpak Bhattacharyya,et al.  An Approach towards Construction and Application of Multilingual Indo-WordNet , 2005 .

[7]  David Matthews,et al.  Machine Transliteration of Proper Names , 2007 .

[8]  Boris New,et al.  Differential Processing of Consonants and Vowels in Lexical Access Through Reading , 2008, Psychological science.

[9]  Girish Nath Jha The TDIL Program and the Indian Langauge Corpora Intitiative (ILCI) , 2010, LREC.

[10]  Min Zhang,et al.  Whitepaper of NEWS 2012 Shared Task on Machine Transliteration , 2011, NEWS@ACL.

[11]  Srikantha Rao,et al.  RULE-BASED PHONETIC MATCHING APPROACH FOR HINDI AND MARATHI , 2011 .

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  Monojit Choudhury,et al.  Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics , 2012, LREC.

[14]  Bhadriraju Krishnamurti,et al.  The Dravidian Languages , 2003 .

[15]  Haizhou Li,et al.  Machine Transliteration: Leveraging on Third Languages , 2010, COLING.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Ratna Sanyal,et al.  Named Entity Recognition for Indian Languages , 2008, IJCNLP.

[18]  Gurpreet Singh Lehal,et al.  A Hindi to Urdu Transliteration System , 2010 .

[19]  Adil Masood Siddiqui,et al.  English to Urdu transliteration: An application of Soundex algorithm , 2010, 2010 International Conference on Information and Emerging Technologies.

[20]  Rohit Gupta,et al.  Transliteration among Indian Languages using WX Notation , 2010, KONVENS.

[21]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[22]  Amba Kulkarni,et al.  Urdu-Hindi-Urdu Machine Translation: Some Problems , 2014 .

[23]  Pushpak Bhattacharyya,et al.  A Hybrid Model for Urdu Hindi Transliteration , 2009, NEWS@IJCNLP.

[24]  Alexander M. Fraser,et al.  A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining , 2012, ACL.

[25]  Manoj Kumar Chinnakotla,et al.  Experiences with English-Hindi, English-Tamil and English-Kannada Transliteration Tasks at NEWS 2009 , 2009, NEWS@IJCNLP.

[26]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[27]  A. Kumaran,et al.  Cross-Lingual Information Retrieval System for Indian Languages , 2008, IJCNLP.

[28]  Jin-Shea Kuo,et al.  Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora , 2004, PACLIC.

[29]  K. Saravanan,et al.  Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora , 2008, IJCNLP.

[30]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[31]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.