Word sense disambiguation in information retrieval using query expansion

There are two problems in using words to represent document contents and query in information retrieval: ambiguity and different words which represent the same concept. These problems can be addressed by using query expansion. We focused on analysing the implementation of query expansion, word sense disambiguation (WSD), iterated relevance feedback, and some retrieval variations to retrieval performance. In this paper, WSD is implemented in Lucene using query expansion with thesaurus and relevance feedback. Extended Lesk algorithm was re-implemented to disambiguate the query using WordNet. Expansion terms were limited up to 20 words chosen from expansion term candidates from disambiguated query's senses information, co-occurrence terms, and most frequent terms using Kullback-Leibler Distance. We iterated the process to find the best number of expansion iteration. We found that the method using WSD to query can extend search process time to 161 times longer at worst. Query expansion using disambiguated sense information did not affect the performance much while using information from relevance feedback did. This experiment provides better understanding of WSD in information retrieval system performance.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[3]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[4]  Liina Pylkkänen,et al.  The Representation of Polysemy: MEG Evidence , 2006, Journal of Cognitive Neuroscience.

[5]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[6]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[7]  Aditi Sharan,et al.  THESAURUS AND QUERY EXPANSION , 2009 .

[8]  Jacques Savoy Why do successful search systems fail for some topics , 2007, SAC '07.

[9]  Claudio Carpineto,et al.  Towards More Effective Techniques for Automatic Query Expansion , 1999, ECDL.

[10]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[11]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.

[12]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[13]  Gerhard Weikum,et al.  Efficient and self-tuning incremental query expansion for top-k query processing , 2005, SIGIR '05.

[14]  Lourdes Araujo,et al.  Comparing and Combining Methods for Automatic Query Expansion , 2008, ArXiv.

[15]  Yiyu Yao,et al.  Conceptual Query Expansion , 2005, AWIC.

[16]  Kanwal Rekhi,et al.  Word Sense Disambiguation , 2007 .

[17]  Robert Krovetz,et al.  Homonymy and Polysemy in Information Retrieval , 1997, ACL.

[18]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[19]  Clement T. Yu,et al.  Word sense disambiguation in queries , 2005, CIKM '05.

[20]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[21]  Erik W. Selberg,et al.  Information Retrieval Advances using Relevance Feedback , 1997 .

[22]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[23]  Jin Wang,et al.  Summarization-based Query Expansion in Information Retrieval , 1998, COLING-ACL.

[24]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[27]  Gobinda G. Chowdhury,et al.  Thesaurus-assisted search term selection and query expansion: a review of user-centred studies , 2002 .

[28]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[29]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .