Improving Patent Search by Search Result Diversification

Patent retrieval has some unique features relative to web search. One major task in this domain is finding existing patents that may invalidate new patents, known as prior-art or invalidity search, where search queries can be formulated from query patents (i.e., new patents). Since a patent document generally contains long and complex descriptions, generating effective search queries can be complex and difficult. Typically, these queries must cover diverse aspects of the new patent application in order to retrieve relevant documents that cover the full scope of the patent. Given this context, search diversification techniques can potentially improve the retrieval performance of patent search by introducing diversity into the document ranking. In this paper, we examine the effectiveness for patent search of a recent term-based diversification framework. Using this framework involves developing methods to identify effective phrases related to the topics mentioned in the query patent. In our experiments, we evaluate our diversification approach using standard measures of retrieval effectiveness and diversity, and show significant improvements relative to state-of-the-art baselines.

[1]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[2]  W. Bruce Croft,et al.  Automatic query generation for patent search , 2009, CIKM.

[3]  W. Bruce Croft,et al.  Inferring query aspects from reformulations using clustering , 2011, CIKM '11.

[4]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[5]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[6]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[7]  W. Bruce Croft,et al.  Automatic suggestion of phrasal-concept queries for literature search , 2014, Inf. Process. Manag..

[8]  W. Bruce Croft,et al.  Diversifying query suggestions based on query documents , 2014, SIGIR.

[9]  Mostafa Keikha,et al.  Automatic refinement of patent queries using concept importance predictors , 2012, SIGIR '12.

[10]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[11]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[12]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[13]  W. Bruce Croft,et al.  Language models for hierarchical summarization , 2003 .

[14]  Johannes Leveling,et al.  United we fall, divided we stand: a study of query segmentation and prf for patent prior art search , 2011, PaIR '11.

[15]  Wim Vanderbauwhede,et al.  A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements , 2010, IIiX.

[16]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[17]  John Tait,et al.  CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain , 2009, CLEF.

[18]  Andreas Rauber,et al.  Improving Retrievability of Patents in Prior-Art Search , 2010, ECIR.

[19]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[20]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[21]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[22]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[23]  A. Törcsvári,et al.  Automated categorization in the international patent classification , 2003, SIGF.

[24]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[25]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[26]  Noriko Kando,et al.  Overview of the Patent Retrieval Task at the NTCIR-6 Workshop , 2007, NTCIR.

[27]  Walid Magdy,et al.  PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[28]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[29]  Xiangji Huang,et al.  Overview of the TREC 2011 Chemical IR Track , 2009, TREC.

[30]  W. Bruce Croft,et al.  Automatic boolean query suggestion for professional search , 2011, SIGIR.

[31]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[32]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[33]  Arjen P. de Vries,et al.  Combining implicit and explicit topic representations for result diversification , 2012, SIGIR '12.

[34]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[35]  Walid Magdy,et al.  A study on query expansion methods for patent retrieval , 2011, PaIR '11.

[36]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[37]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[38]  Noriko Kando,et al.  An empirical study on retrieval models for different document genres: patents and newspaper articles , 2003, SIGIR '03.

[39]  Makoto Iwayama,et al.  Proposal of two-stage patent retrieval method considering the claim structure , 2005, TALIP.

[40]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[41]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[42]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[43]  Walid Magdy,et al.  Patent query reduction using pseudo relevance feedback , 2011, CIKM '11.

[44]  Laurent Romary,et al.  Experiments with Citation Mining and Key-Term Extraction for Prior Art Search , 2010, CLEF.

[45]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.