Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4

At the NTCIR-4 workshop, Justsystem Corporation (JSC) and Clairvoyance Corporation (CC) collaborated in the cross-language retrieval task (CLIR). Our goal was to evaluate the performance and robustness of our recently developed commercial-grade CLIR systems for English and Asian languages. The main contribution of this article is the investigation of different strategies, their interactions in both monolingual and bilingual retrieval tasks, and their respective contributions to operational retrieval systems in the context of NTCIR-4. We report results of Japanese and English monolingual retrieval and results of Japanese-to-English bilingual retrieval. In monolingual retrieval analysis, we examine two special properties of the NTCIR experimental design (two levels of relevance and identical queries in multiple languages) and explore how they interact with strategies of our retrieval system, including pseudo-relevance feedback, multi-word term down-weighting, and term weight merging strategies. Our analysis shows that the choice of language (English or Japanese) does not have a significant impact on retrieval performance. Query expansion is slightly more effective with relaxed judgments than with rigid judgments. For better retrieval performance, weights of multi-word terms should be lowered. In the bilingual retrieval analysis, we aim to identify robust strategies that are effective when used alone and when used in combination with other strategies. We examine cross-lingual specific strategies such as translation disambiguation and translation structuring, as well as general strategies such as pseudo-relevance feedback and multi-word term down-weighting. For shorter title topics, pseudo-relevance feedback is a major performance enhancer, but translation structuring affects retrieval performance negatively when used alone or in combination with other strategies. All experimented strategies improve retrieval performance for the longer description topics, with pseudo-relevance feedback and translation structuring as the major contributors.

[1]  Natasa Milic-Frayling,et al.  Experiments on Chinese Text Indexing -- CLARIT TREC-5 Chinese Track Report , 1996, TREC.

[2]  W. Bruce Croft,et al.  A comparison of indexing techniques for Japanese text retrieval , 1993, SIGIR.

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Jacques Savoy Report on CLIR Task for the NTCIR-4 Evaluation Campaign , 2004, NTCIR.

[5]  Fredric C. Gey,et al.  Chinese text retrieval without using a dictionary , 1997, SIGIR '97.

[6]  Dong-Hong Ji,et al.  Chinese Information Retrieval Based on Terms and Ontology , 2004, NTCIR.

[7]  Douglas W. Oard,et al.  Structured translation for cross-language information retrieval , 2000, SIGIR '00.

[8]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[9]  Kalervo Järvelin,et al.  Applying query structuring in cross-language retrieval , 2003, Inf. Process. Manag..

[10]  Tetsuji Nakagawa,et al.  NTCIR-5 CLIR Experiments at Oki , 2004, NTCIR.

[11]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[12]  Jianqiang Wang,et al.  NTCIR-2 ECIR Experiments at Maryland: Comparing Pirkola's Structured Queries and Balanced Translation , 2001, NTCIR.

[13]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[14]  Kui-Lam Kwok Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[15]  Noriko Kando,et al.  Two Stages Refinement of Query Translation for Pivot Language Approach to Cross Lingual Information Retrieval: A Trial at CLEF 2003 , 2003, CLEF.

[16]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[17]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[18]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[19]  Sumio Fujita,et al.  Notes on Phrasal Indexing: JSCB Evaluation Experiments at NTCIR AD HOC , 1999, NTCIR.

[20]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[21]  Tetsuya Sakai,et al.  Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback , 2004, NTCIR.

[22]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[23]  Harold R. Lindman,et al.  Analysis of variance in complex experimental designs , 1974 .

[24]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[25]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[26]  DouglasW. Oard andJianqiangWANG NTCIR-2 ECIR Experiments at Maryland : Comparing Structured Queries and Balanced Translation , 2001 .

[27]  K. L. Kwok Employing multiple representations for Chinese information retrieval , 1999 .

[28]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[29]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[30]  Gregory Grefenstette The Problem of Cross-Language Information Retrieval , 1998 .

[31]  Mark W. Davis,et al.  QUILT: implementing a large-scale cross-language text retrieval system , 1997, SIGIR '97.

[32]  Yuji Matsumoto,et al.  Language Independent Morphological Analysis , 2000, ANLP.

[33]  Gregory Grefenstette,et al.  Resolving Translation Ambiguity using Monolingual Corpora. A Report on Clairvoyance CLEF-2002 Experiments , 2002, CLEF.

[34]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[35]  Kui-Lam Kwok Employing Multiple Representations for Chinese Information Retrieval , 1999, J. Am. Soc. Inf. Sci..

[36]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[37]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .