Improving Retrievability of Patents in Prior-Art Search

Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or can not be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.

[1]  A. Törcsvári,et al.  Automated categorization in the international patent classification , 2003, SIGF.

[2]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[3]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[4]  Tomek Strzalkowski,et al.  Evaluating document retrieval in patent database: a preliminary report , 1997, CIKM '97.

[5]  Makoto Iwayama,et al.  Proposal of two-stage patent retrieval method considering the claim structure , 2005, TALIP.

[6]  Kazuya Konishi Query Terms Extraction from Patent Document for Invalidity Search , 2005, NTCIR.

[7]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[8]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[9]  Andreas Rauber,et al.  Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection , 2009, CIKM.

[10]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[11]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[12]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[13]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[14]  Jinglei Zhao,et al.  A proximity language model for information retrieval , 2009, SIGIR.

[15]  Hideo Itoh,et al.  Term Distillation in Patent Retrieval , 2003, ACL 2003.

[16]  Masaki Murata,et al.  Using the K-Nearest Neighbor Method and SMART Weighting in the Patent Document Categorization Subtask at NTCIR-6 , 2007, NTCIR.

[17]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[18]  Leif Azzopardi,et al.  Retrievability: an evaluation measure for higher order information access tasks , 2008, CIKM '08.

[19]  Khalid Al-Kofahi,et al.  A new approach for evaluating query expansion: query-document term mismatch , 2007, SIGIR.

[20]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[21]  Andreas Rauber,et al.  Analyzing Document Retrievability in Patent Retrieval Settings , 2009, DEXA.

[22]  Kazuya Konishi,et al.  Invalidity Patent Search System of NTT DATA , 2004, NTCIR.