Fusion of Multiple Features and Ranking SVM for Web-based English-Chinese OOV Term Translation

This paper focuses on the Web-based English-Chinese OOV term translation pattern, and emphasizes particularly on the translation selection strategy based on the fusion of multiple features and the ranking mechanism based on Ranking Support Vector Machine (Ranking SVM). By utilizing the CoNLL2003 corpus for the English Named Entity Recognition (NER) task and selected new terms, the experiments based on different data sources show the consistent results. Our OOV term translation model can "filter" the most possible translation candidates with better ability. From the experimental results for combining our OOV term translation model with English-Chinese Cross-Language Information Retrieval (CLIR) on the data sets of Text Retrieval Evaluation Conference (TREC), it can be found that the obvious performance improvement for both query translation and retrieval can also be obtained.

[1]  Sanjeev Khudanpur,et al.  Transliteration of proper names in cross-language applications , 2003, SIGIR.

[2]  Ying Zhang,et al.  Detection and translation of OOV terms prior to query time , 2004, SIGIR '04.

[3]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[6]  Jason S. Chang,et al.  Learning to Find English to Chinese Transliterations on the Web , 2007, EMNLP-CoNLL.

[7]  Ying Zhang,et al.  Using the web for automated translation extraction in cross-language information retrieval , 2004, SIGIR '04.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[10]  Hsi-Jian Lee,et al.  Anchor text mining for translation of Web queries: A transitive translation approach , 2004, TOIS.

[11]  Hao Yu,et al.  Chinese-English Term Translation Mining Based on Semantic Prediction , 2006, ACL.

[12]  Pu-Jen Cheng,et al.  Translating unknown cross-lingual queries in digital libraries using a Web-based approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[13]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[14]  Hsi-Jian Lee,et al.  Translation of web queries using anchor text mining , 2002, TALIP.

[15]  Jyh-Shing Roger Jang,et al.  Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources , 2006, TALIP.

[16]  Tao Tao,et al.  Named Entity Transliteration with Comparable Corpora , 2006, ACL.

[17]  Min Zhao,et al.  Ranking definitions with supervised learning methods , 2005, WWW '05.

[18]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[19]  Long Jiang,et al.  Named Entity Translation with Web Mining and Transliteration , 2007, IJCAI.

[20]  Hsin-Hsi Chen,et al.  A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics , 2006, ACL.

[21]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[22]  Ying Zhang,et al.  Mining translations of OOV terms from the web through cross-lingual query expansion , 2005, SIGIR '05.