Automatic boolean query suggestion for professional search

In professional search environments, such as patent search or legal search, search tasks have unique characteristics: 1) users interactively issue several queries for a topic, and 2) users are willing to examine many retrieval results, i.e., there is typically an emphasis on recall. Recent surveys have also verified that professional searchers continue to have a strong preference for Boolean queries because they provide a record of what documents were searched. To support this type of professional search, we propose a novel Boolean query suggestion technique. Specifically, we generate Boolean queries by exploiting decision trees learned from pseudo-labeled documents and rank the suggested queries using query quality predictors. We evaluate our algorithm in simulated patent and medical search environments. Compared with a recent effective query generation system, we demonstrate that our technique is effective and general.

[1]  Richard Bache,et al.  Improving Access to Large Patent Corpora , 2010, Trans. Large Scale Data Knowl. Centered Syst..

[2]  W. Bruce Croft,et al.  Learning to rank query reformulations , 2010, SIGIR '10.

[3]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[4]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[5]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  Stuart J. Russell,et al.  Artificial Intelligence , 1986 .

[8]  Patrick Ruch,et al.  Report on the TREC 2009 Experiments: Chemical IR Track , 2009, TREC.

[9]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[10]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[11]  Noriko Kando,et al.  Overview of the Patent Retrieval Task at the NTCIR-6 Workshop , 2007, NTCIR.

[12]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[13]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[14]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[15]  Tatjana Zrimec,et al.  CQGF: Context specific query generation framework from computerized clinical practice guidelines , 2009, 2009 Second International Conference on the Applications of Digital Information and Web Technologies.

[16]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[17]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[18]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Xiangji Huang,et al.  Overview of the TREC 2011 Chemical IR Track , 2009, TREC.

[21]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[22]  Yuen-Hsien Tseng,et al.  A study of search tactics for patentability search: a case study on patent engineers , 2008, PaIR '08.

[23]  Ryen W. White,et al.  Studying the use of popular destinations to enhance web search interaction , 2007, SIGIR.

[24]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[25]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[26]  Wim Vanderbauwhede,et al.  Search system requirements of patent analysts , 2010, SIGIR '10.

[27]  Yusuke Sato,et al.  NTCIR-5 Patent Retrieval Experiments at Hitachi , 2005, NTCIR.

[28]  W. Bruce Croft,et al.  Automatic query generation for patent search , 2009, CIKM.

[29]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[30]  Wim Vanderbauwhede,et al.  A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements , 2010, IIiX.

[31]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.