Enhanced Web document retrieval using automatic query expansion

The ever growing popularity of the Internet as a source of information, coupled with the accompanying growth in the number of documents made available through the World Wide Web, is leading to an increasing demand for more efficient and accurate information retrieval tools. Numerous techniques have been proposed and tried for improving the effectiveness of searching the World Wide Web for documents relevant to a given topic of interest. The specification of appropriate keywords and phrases by the user is crucial for the successful execution of a query as measured by the relevance of documents retrieved. Lack of users' knowledge on the search topic and their changing information needs often make it difficult for them to find suitable keywords or phrases for a query. This results in searches that fail to cover all likely aspects of the topic of interest. We describe a scheme that attempts to remedy this situation by automatically expanding the user query through the analysis of initially retrieved documents. Experimental results to demonstrate the effectiveness of the query expansion scheme are presented.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Thomas A. Runkler,et al.  Automatic keyword extraction with relational clustering and Levenshtein distances , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[3]  Preben Hansen,et al.  The information seeking and retrieval process at the Swedish patent- and registration office: moving from lab-based to real life work-task environment , 2000, SIGIR 2000.

[4]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[5]  Shivakumar Vaithyanathan,et al.  Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.

[6]  James A. Thom,et al.  Relevance Judgments for Assessing Recall , 1996, Inf. Process. Manag..

[7]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[8]  Iain Campbell,et al.  Supporting Information Needs by Ostensive Definition in an Adaptive Information Space , 1995, MIRO.

[9]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[10]  Amanda Spink,et al.  Toward a Theoretical Framework for Information Retrieval (IR) Evaluation in an Information Seeking Context , 1999, MIRA.

[11]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[12]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[13]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[14]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[15]  Yasushi Ogawa,et al.  The use of phrases from query texts in information retrieval (poster session) , 2000, SIGIR '00.

[16]  C. Kuhlthau The Concept of a Zone of Intervention for Identifying the Role of Intermediaries in the Information Search Process. , 1996 .

[17]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[18]  F. J. Devadason,et al.  A Methodology for the Identification of Information Needs of Users , 1997 .

[19]  C. J. van Rijsbergen,et al.  (invited paper) A new theoretical framework for information retrieval , 1986, SIGIR '86.

[20]  W. Bruce Croft,et al.  Experiments with query acquisition and use in document retrieval systems , 1989, SIGIR '90.

[21]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[22]  Peter D. Turney Extraction of Keyphrases from Text: Evaluation of Four Algorithms , 2002, ArXiv.

[23]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[24]  C. J. van Rijsbergen,et al.  A New Theoretical Framework for Information Retrieval , 1986, SIGIR Forum.

[25]  Joel L. Fagan,et al.  Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 1987, SIGIR.

[26]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[27]  Mark Sanderson,et al.  Retrieving descriptive phrases from large amounts of free text , 2000, CIKM '00.

[28]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[29]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[30]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[31]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .