Query Phrase Expansion Using Wikipedia in Patent Class Search

Relevance Feedback methods generally suffer from topic drift caused by words ambiguity and synonymous uses of words. As a way to alleviate the inherent problem, we propose a novel query phrase expansion approach utilizing semantic annotations in Wikipedia pages, trying to enrich queries with context disambiguating phrases. Focusing on the patent domain, especially on patent search where patents are classified into a hierarchy of categories, we attempt to understand the roles of phrases and words in query expansion in determining the relevance of documents and examine their contributions to alleviating the query drift problem. Our approach is compared against Relevance Model, a state-of-the-art, to show its superiority in terms of MAP on all levels of the classification hierarchy.

[1]  Jian-Yun Nie,et al.  Adapting information retrieval to query contexts , 2008, Inf. Process. Manag..

[2]  Jintao Li,et al.  Improved latent concept expansion using hierarchical markov random fields , 2010, CIKM.

[3]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[5]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[6]  Korris Fu-Lai Chung,et al.  Improving weak ad-hoc queries using wikipedia asexternal corpus , 2007, SIGIR.

[7]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[8]  Olga Vechtomova,et al.  Query expansion with terms selected using lexical cohesion analysis of documents , 2007, Inf. Process. Manag..

[9]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[10]  Wim Vanderbauwhede,et al.  Search system requirements of patent analysts , 2010, SIGIR '10.

[11]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[12]  Avi Arampatzis,et al.  Phase-Based Information Retrieval , 1998, Inf. Process. Manag..

[13]  Burkhard Schafer,et al.  Concept and Context in Legal Information Retrieval , 2008, JURIX.

[14]  Avi Arampatzis,et al.  Phrase-based Information Retrieval , 1998 .

[15]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[16]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[17]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[18]  Stephen E. Robertson,et al.  On document relevance and lexical cohesion between query terms , 2006, Inf. Process. Manag..

[19]  Vasudeva Varma,et al.  Exploiting Structure and Content of Wikipedia for Query Expansion in the Context , 2009, RANLP.

[20]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[21]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[22]  Sung-Hyon Myaeng,et al.  IRNLP@KAIST in Subtask of Research Papers Classification in NTCIR-8 , 2010, NTCIR.

[23]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[24]  Cornelis H.A. Koster,et al.  Phrase-based document categorization revisited , 2009 .

[25]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[26]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[27]  Kui-Lam Kwok,et al.  Improving two-stage ad-hoc retrieval for short queries , 1998, SIGIR '98.

[28]  ChengXiang Zhai,et al.  Adaptive relevance feedback in information retrieval , 2009, CIKM.

[29]  Milad Shokouhi,et al.  Query Expansion Using External Evidence , 2009, ECIR.

[30]  Gongzhu Hu,et al.  Document classification efficiency of phrase-based techniques , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.