Direct orthographical mapping for machine transliteration

Machine transliteration/back-transliteration plays an important role in many multilingual speech and language applications. In this paper, a novel framework for machine transliteration/back-transliteration that allows us to carry out direct orthographical mapping (DOM) between two different languages is presented. Under this framework, a joint source-channel transliteration model, also called n-gram transliteration model (n-gram TM), is further proposed to model the transliteration process. We evaluate the proposed methods through several transliteration/back-transliteration experiments for English/Chinese and English/Japanese language pairs. Our study reveals that the proposed method not only reduces an extensive system development effort but also improves the transliteration accuracy significantly.

[1]  Wei Gao,et al.  Phoneme-Based Transliteration of Foreign Names for OOV Problem , 2004, IJCNLP.

[2]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Key-Sun Choi,et al.  Automatic Transliteration and Back-transliteration by Decision Tree Learning , 2000, LREC.

[5]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Berlin Chen,et al.  Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[8]  Key-Sun Choi,et al.  An English-Korean Transliteration Model Using Pronunciation and Contextual Rules , 2002, COLING.

[9]  Karin M. Verspoor,et al.  Automatic English-Chinese name transliteration for development of multilingual resources , 1998, ACL.

[10]  Hozumi Tanaka,et al.  Improving Back-Transliteration by Combining Information Sources , 2004, IJCNLP.

[11]  Eunok Paek,et al.  An English to Korean Transliteration Model of Extended Markov Window , 2000, COLING.

[12]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Jason S. Chang,et al.  Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model , 2003, ParallelTexts@NAACL-HLT.

[15]  Eric Brill,et al.  Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs , 2001, NLPRS.