Effective query generation and postprocessing strategies for prior art patent search

Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search. © 2012 Wiley Periodicals, Inc.

[1]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[2]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[3]  Kazuya Konishi Query Terms Extraction from Patent Document for Invalidity Search , 2005, NTCIR.

[4]  Makoto Iwayama,et al.  Proposal of two-stage patent retrieval method considering the claim structure , 2005, TALIP.

[5]  Hideo Itoh,et al.  NTCIR-4 Patent Retrieval Experiments at RICOH , 2004, NTCIR.

[6]  IwayamaMakoto,et al.  Proposal of two-stage patent retrieval method considering the claim structure , 2005 .

[7]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[8]  Peter Murray-Rust,et al.  High-Throughput Identification of Chemistry in Life Science Texts , 2006, CompLife.

[9]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[10]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[13]  W. Bruce Croft,et al.  Automatic query generation for patent search , 2009, CIKM.

[14]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[15]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[16]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[17]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[18]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[19]  King-Lup Liu,et al.  Evaluation of Result Merging Strategies for Metasearch Engines , 2005, WISE.

[20]  Padmini Srinivasan,et al.  Using Classification Code Hierarchies for Patent Prior Art Searches , 2011, Current Challenges in Patent Information Retrieval.

[21]  Luo Si,et al.  Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search , 2007, SIGIR.

[22]  Luo Si,et al.  Strategies for Effective Chemical Information Retrieval , 2009, TREC.

[23]  Mostafa Keikha,et al.  Building Queries for Prior-Art Search , 2011, IRFC.

[24]  Sougata Mukherjea,et al.  BioPatentMiner: An Information Retrieval System for BioMedical Patents , 2004, VLDB.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  C. Lee Giles,et al.  Mining, indexing, and searching for textual chemical molecule information on the web , 2008, WWW.

[27]  Tetsuya Ishikawa,et al.  Associative document retrieval by query subtopic analysis and its application to invalidity patent search , 2004, CIKM '04.

[28]  Xiangji Huang,et al.  Evaluation of Chemical Information Retrieval Tools , 2011, Current Challenges in Patent Information Retrieval.

[29]  Tao Qin,et al.  How to Make LETOR More Useful and Reliable , 2008 .