Learning weights for translation candidates in Japanese-Chinese information retrieval

This paper describes our Japanese-Chinese information retrieval system. Our system takes the "query-translation" approach. Our system employs both a more conventional bilingual Japanese-Chinese dictionary and Wikipedia for translating query terms. We propose that Wikipedia can be used as a good NE bilingual dictionary. By exploiting the nature of Japanese writing system, we propose that query terms be processed differently based on the forms they are written in. We use an iterative method for weight-tuning and term disambiguation, which is based on the PageRank algorithm. When evaluating on the NTCIR-5 test set, our system achieves as high as 0.2217 and 0.2276 in relax MAP (mean average precision) measurement of T-runs and D-runs.

[1]  Shyi-Ming Chen,et al.  Query expansion for document retrieval based on fuzzy rules and user relevance feedback techniques , 2006, Expert Syst. Appl..

[2]  Yuji Matsumoto,et al.  Chinese-Japanese Cross Language Information Retrieval: A Han Character Based Approach , 2000, ACL 2000.

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Pu-Jen Cheng,et al.  Translating unknown queries with web corpora for cross-language information retrieval , 2004, SIGIR '04.

[5]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[6]  Christof Monz,et al.  Iterative translation disambiguation for cross-language information retrieval , 2005, SIGIR '05.

[7]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[8]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[9]  Yuseop Kim,et al.  Intra-sentence segmentation based on support vector machines in English-Korean machine translation systems , 2008, Expert Syst. Appl..

[10]  William C. Hannas,et al.  Writing and Literacy in Chinese, Korean, and Japanese , 1997 .

[11]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[12]  Jason S. Chang,et al.  Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model , 2003, ParallelTexts@NAACL-HLT.

[13]  Shih-Hung Wu,et al.  Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities , 2006, Expert systems with applications.

[14]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Sixth NTCIR Workshop , 2005, NTCIR.

[15]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[16]  Fredric C. Gey How Similar are Chinese and Japanese for Cross-Language Information Retrieval? , 2005, NTCIR.

[17]  Dong-Hong Ji,et al.  Document reranking by term distribution and maximal marginal relevance for chinese information retrieval , 2007, Inf. Process. Manag..

[18]  Claire Cardie,et al.  Using clustering and SuperConcepts within SMART: TREC 6 , 1997, Inf. Process. Manag..

[19]  Jason S. Chang,et al.  Learning to Find English to Chinese Transliterations on the Web , 2007, EMNLP-CoNLL.

[20]  Tetsuji Nakagawa,et al.  NTCIR-5 CLIR Experiments at Oki , 2004, NTCIR.

[21]  Nancy Ide Proceedings of the ACL-2000 workshop on Word senses and multi-linguality - Volume 8 , 2000 .

[22]  Chittibabu Govindarajulu,et al.  Knowledge-Based Information Retrieval for Group Decision Support Systems , 1994 .

[23]  Haizhou Li,et al.  Learning Transliteration Lexicons from the Web , 2006, ACL.