Patent query reduction using pseudo relevance feedback

Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).

[1]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[2]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[3]  Mirna Adriani,et al.  Prior Art Retrieval Using Various Patent Document Fields Contents , 2010, CLEF.

[4]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[5]  Kazuya Konishi Query Terms Extraction from Patent Document for Invalidity Search , 2005, NTCIR.

[6]  Ting Liu,et al.  A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. , 2004, SIGIR '04.

[7]  Walid Magdy,et al.  Simple vs. Sophisticated Approaches for Patent Prior-Art Search , 2011, ECIR.

[8]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[9]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[10]  Walid Magdy,et al.  PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[11]  Hideo Itoh,et al.  Term Distillation in Patent Retrieval , 2003, ACL 2003.

[12]  Walid Magdy,et al.  Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval , 2009, CLEF.

[13]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[14]  Kazuaki Kishida Experiment on Pseudo Relevance Feedback Method Using Taylor Formula at NTCIR-3 Patent Retrieval Task , 2002, NTCIR.

[15]  Egidio L. Terra,et al.  Poison pills: harmful relevant documents in feedback , 2005, CIKM '05.

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  Naohiko Uramoto,et al.  Experiments on Patent Retrieval at NTCIR-5 Workshop , 2004, NTCIR.