Context-based generic cross-lingual retrieval of documents and automated summaries

We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as well as in the documents based on co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as well as retrieving English documents from Chinese queries. Extensive experiments have been conducted on a large-scale parallel corpus enabling studies on retrieval performance for two different cross-lingual settings of full-length documents as well as automated summaries.

[1]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[3]  Yiming Yang,et al.  Translingual Information Retrieval: A Comparative Evaluation , 1997, IJCAI.

[4]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[5]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[6]  Douglas W. Oard,et al.  Improved Cross-Language Retrieval using Backoff Translation , 2001, HLT.

[7]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[8]  Wai Lam,et al.  Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment , 2002, LREC.

[9]  James Allan,et al.  UMass at TREC 2002: Cross Language and Novelty Tracks , 2002, TREC.

[10]  Jian-Yun Nie Towards a Unified Approach to CLIR and Multilingual IR ( Position paper ) , 2002 .

[11]  Jianfeng Gao,et al.  Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations , 2002, SIGIR '02.

[12]  Wai Lam,et al.  Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics , 2002, COLING.

[13]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[14]  Jinxi Xu,et al.  Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[15]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[16]  Fredric C. Gey,et al.  Cross language information retrieval: a research roadmap , 2002, SIGF.

[17]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[18]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[19]  Kui-Lam Kwok,et al.  TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS , 2000, TREC.

[20]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[21]  Mark Sanderson,et al.  Improving Cross Language Information Retrieval with Triangulated Translation. , 2001, SIGIR 2002.