Using Semantic Knowledge in the Uyghur-Chinese Person Name Transliteration

In this paper, we propose a transliteration approach based on semantic information (i.e., language origin and gender) which are automatically learnt from the person name, aiming to transliterate the person name of Uyghur into Chinese. The proposed approach integrates semantic scores (i.e., performance on language origin and gender detection) with general transliteration model and generates the semantic knowledge-based model which can produce the best candidate transliteration results. In the experiment, we use the datasets which contain the person names of different language origins: Uyghur and Chinese. The results show that the proposed semantic transliteration model substantially outperforms the general transliteration model and greatly improves the mean reciprocal rank (MRR) performance on two datasets, as well as aids in developing more efficient transliteration for named entities.

[1]  Haizhou Li,et al.  Semantic Transliteration of Personal Names , 2007, ACL.

[2]  Chai Wutiwiwatchai,et al.  Syllable-Based Thai-English Machine Transliteration , 2010, NEWS@ACL.

[3]  Karthik Gali,et al.  Modeling Machine Transliteration as a Phrase Based Statistical Machine Translation Problem , 2009, NEWS@IJCNLP.

[5]  Haizhou Li,et al.  Machine Transliteration: Leveraging on Third Languages , 2010, COLING.

[6]  Satoshi Sekine,et al.  Latent Semantic Transliteration using Dirichlet Mixture , 2012, NEWS@ACL.

[7]  Vasudeva Varma,et al.  A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009 , 2009, NEWS@IJCNLP.

[8]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[9]  LiLi Xu,et al.  Modeling Impression in Probabilistic Transliteration into Chinese , 2006, EMNLP.

[10]  Haizhou Li,et al.  Report of NEWS 2016 Machine Transliteration Shared Task , 2016, NEWS@ACM.

[11]  Eiichiro Sumita,et al.  Phrase-based Machine Transliteration , 2008, IJCNLP.

[12]  Pushpak Bhattacharyya,et al.  Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup , 2009, NEWS@IJCNLP.

[13]  Chew Lim Tan,et al.  Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars , 2011, ACL.

[14]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Shiwen Yu,et al.  A Noisy Channel Model for Grapheme-based Machine Transliteration , 2009, NEWS@IJCNLP.

[17]  Key-Sun Choi,et al.  An Ensemble of Grapheme and Phoneme for Machine Transliteration , 2005, IJCNLP.

[18]  Anil Kumar Singh,et al.  A More Discerning and Adaptable Multilingual Transliteration Mechanism for Indian Languages , 2008, IJCNLP.

[19]  Yusup Abaydulla,et al.  Research and Implementation of the Uyghur-Chinese Personal Name Transliteration Based on Syllabification , 2013, 2013 International Conference on Asian Language Processing.

[20]  Le Sun,et al.  A Syllable-based Name Transliteration System , 2009, NEWS@IJCNLP.

[21]  Parminder Singh,et al.  Review of Machine Transliteration Techniques , 2014 .