论文信息 - Document Expansion Based on WordNet for Robust IR - 字舞流文

Document Expansion Based on WordNet for Robust IR

The use of semantic information to improve IR is a long-standing goal. This paper presents a novel Document Expansion method based on a WordNet-based system to find related concepts and words. Expansion words are indexed separately, and when combined with the regular index, they improve the results in three datasets over a state-of-the-art IR engine. Considering that many IR systems are not robust in the sense that they need careful fine-tuning and optimization of their parameters, we explored some parameter settings. The results show that our method is specially effective for realistic, non-optimal settings, adding robustness to the IR engine. We also explored the effect of document length, and show that our method is specially successful with shorter documents.

Arantxa Otegi | Xabier Arregi | Eneko Agirre | Eneko Agirre | Xabier Arregi | Arantxa Otegi

[1] Ellen M. Voorhees,et al. Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[2] Julio Gonzalo,et al. Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[3] Amit Singhal,et al. Document expansion for speech retrieval , 1999, SIGIR '99.

[4] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[5] John Tait,et al. Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[6] Hae-Chang Rim,et al. Information retrieval using word senses: root sense tagging approach , 2004, SIGIR '04.

[7] Oren Kurland,et al. Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[8] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.

[9] Sebastiano Vigna,et al. MG4J at TREC 2005 , 2005, TREC.

[10] Clement T. Yu,et al. Word sense disambiguation in queries , 2005, CIKM '05.

[11] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[12] Tao Tao,et al. Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[13] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[14] Mihai Surdeanu,et al. Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[15] ChengXiang Zhai,et al. A general optimization framework for smoothing language models on graph structures , 2008, SIGIR '08.

[16] Carol Peters,et al. CLEF 2008: Ad Hoc Track Overview , 2008, CLEF.

[17] Arantxa Otegi,et al. CLEF 2009 Ad Hoc Track Overview: Robust - WSD Task , 2009, CLEF.

[18] Jian-Yun Nie,et al. Smoothing document language model with local word graph , 2009, CIKM.

[19] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[20] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[21] Eneko Agirre,et al. Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[22] Anselmo Peñas,et al. Overview of ResPubliQA 2009: Question Answering Evaluation over European Legislation , 2009, CLEF.

[23] Kevyn Collins-Thompson,et al. Reducing the risk of query expansion via robust constrained optimization , 2009, CIKM.

[24] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..