论文信息 - Cluster-specific Named Entity Transliteration

Cluster-specific Named Entity Transliteration

Existing named entity (NE) transliteration approaches often exploit a general model to transliterate NEs, regardless of their origins. As a result, both a Chinese name and a French name (assuming it is already translated into Chinese) will be translated into English using the same model, which often leads to unsatisfactory performance. In this paper we propose a cluster-specific NE transliteration framework. We group name origins into a smaller number of clusters, then train transliteration and language models for each cluster under a statistical machine translation framework. Given a source NE, we first select appropriate models by classifying it into the most likely cluster, then we transliterate this NE with the corresponding models. We also propose a phrase-based name transliteration model, which effectively combines context information for transliteration. Our experiments showed substantial improvement on the transliteration accuracy over a state-of-the-art baseline system, significantly reducing the transliteration character error rate from 50.29% to 12.84%.

Fei Huang

[1] Stephan Vogel,et al. Improved named entity translation and bilingual named entity extraction , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[2] Yaser Al-Onaizan,et al. Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[3] Berlin Chen,et al. Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4] Sanjeev Khudanpur,et al. Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[6] Alexander H. Waibel,et al. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization , 2003, NER@ACL.

[7] Jason S. Chang,et al. Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model , 2003, ParallelTexts@NAACL-HLT.

[8] Daniel Marcu,et al. A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[9] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[10] Dekai Wu,et al. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[11] Gregory Grefenstette,et al. Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation , 2004, ACL.

[12] Kevin Knight,et al. Machine Transliteration , 1997, CL.