Translating-transliterating named entities for multilingual information access

monolingual named entities. Extending them to multilingual entities is becoming important because a large amount of multilingual materials are generated and disseminated over the Web. The fundamental issues in processing multilingual named entities are recognizing them and finding their correspondence. Embedded technologies include learning formulation and transformation rules for multilingual named entities, and translating–transliterating named entities and content words. The common named entities mentioned at the 7th Message Understanding Conference (MUC, 1998), including date/time expressions and monetary and percentage expressions, have fixed patterns, so their mapping among different languages is easy. This work focuses on more flexible patterns such as person, location, and organization names. Transformation between named entities in different languages is not only translation or transliteration. The mapping may be a combination of meaning translation and/or phoneme transliteration (Chen, Yang, & Lin, 2003). The following five English–Chinese examples clarify this issue, where the symbol A ⇔ B denotes that foreign name A is translated and/or transliterated into a Chinese name B.

[1]  Hsin-Hsi Chen,et al.  Cross-language information access to multilingual collections on the internet , 2000, J. Am. Soc. Inf. Sci..

[2]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[3]  Hsin-Hsi Chen,et al.  An NLP & IR approach to topic detection , 2002 .

[4]  Hsin-Hsi Chen,et al.  Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval , 1999, ACL.

[5]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[6]  Hsin-Hsi Chen,et al.  Cross-Language Image Retrieval via Spoken Query , 2004, RIAO.

[7]  Hsin-Hsi Chen,et al.  Backward Machine Transliteration by Learning Phonetic Similarity , 2002, CoNLL.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Kevin Knight,et al.  Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[10]  Hsin-Hsi Chen,et al.  Clustering and Visualization in a Multi-lingual Multi-document Summarization System , 2003, ECIR.

[11]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[12]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[13]  Karin M. Verspoor,et al.  Automatic English-Chinese name transliteration for development of multilingual resources , 1998, ACL.

[14]  Mark Sanderson,et al.  Eurovision – an image-based CLIR system , 2002 .

[15]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[16]  Ariadna Font Llitjós,et al.  Knowledge of language origin improves pronunciation accuracy of proper names , 2001, INTERSPEECH.

[17]  Paul Thompson,et al.  Name Searching and Information Retrieval , 1997, EMNLP.

[18]  Hsin-Hsi Chen,et al.  Learning Formulation and Transformation Rules for Multilingual Named Entities , 2003, NER@ACL.

[19]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[20]  Hsin-Hsi Chen,et al.  反向異文字音譯相似度評量方法與跨語言資訊檢索 (Similarity Measure in Backward Transliteration between Different Character Sets and Its Application to CLIR) [In Chinese] , 2000, ROCLING/IJCLCLP.

[21]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[22]  Hsin-Hsi Chen,et al.  Proper Name Translation in Cross-Language Information Retrieval , 1998, COLING-ACL.

[23]  Sung-Hyon Myaeng,et al.  Automatic identification and back-transliteration of foreign words for information retrieval , 1999, Inf. Process. Manag..

[24]  SHIH,et al.  Named Entity Extraction for Information Retrieval , 2002 .

[25]  Douglas W. Oard,et al.  Cross-language Information Retrieval , 2021, ArXiv.

[26]  Hsin-Hsi Chen,et al.  Enhancing performance of protein and gene name recognizers with filtering and integration strategies , 2004, J. Biomed. Informatics.

[27]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[28]  Michael Biehl,et al.  On-Line Learning with a Perceptron , 1994 .

[29]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[30]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[31]  Berlin Chen,et al.  Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[32]  Hsin-Hsi Chen,et al.  Foreign Name Backward Transliteration in Chinese-English Cross-Language Image Retrieval , 2003, CLEF.

[33]  Ruizhang Huang,et al.  Mining events and new name translations from online daily news , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[34]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.