Patent retrieval: a literature review

With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent retrieval (PR) is considered the pillar of almost all patent analysis tasks. PR is a subfield of information retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper, we present a comprehensive review on PR methods and approaches. It is clear that recent successes and maturity in IR applications such as Web search cannot be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average in terms of recall. These observations motivate the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.

[1]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[2]  Kazuya Konishi Query Terms Extraction from Patent Document for Invalidity Search , 2005, NTCIR.

[3]  Steven Foster,et al.  On the role of classification in patent invalidity searches , 2009 .

[4]  Michael Schroeder,et al.  Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed , 2013, Journal of Biomedical Semantics.

[5]  Jean O. Lanjouw,et al.  How to Count Patents and Value Intellectual Property: Uses of Patent Renewal and Application Data , 1996 .

[6]  Allan Hanbury,et al.  Overview of CLEF-IP 2013 Lab - Information Retrieval in the Patent Domain , 2013, CLEF.

[7]  Linda Kato,et al.  Exploratory analytics on patent data sets using the SIMPLE platform , 2011 .

[8]  Wang-Chien Lee,et al.  Exploring Legal Patent Citations for Patent Valuation , 2014, CIKM.

[9]  Andreas Rauber,et al.  Mining Query Logs of USPTO Patent Examiners , 2013, CLEF.

[10]  Akira Tajima,et al.  Modeling Patent Quality: A System for Large-scale Patentability Analysis using Text Mining , 2012, J. Inf. Process..

[11]  Mark A. Lemley,et al.  Understanding the Realities of Modern Patent Litigation , 2014 .

[12]  Lanfen Lin,et al.  Query construction based on concept importance for effective patent retrieval , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[13]  Mirna Adriani,et al.  Prior Art Retrieval Using Various Patent Document Fields Contents , 2010, CLEF.

[14]  Georgios Paltoglou,et al.  Multilayer source selection as a tool for supporting patent search and classification , 2015, Information Retrieval Journal.

[15]  Yannis Tzitzikas,et al.  Exploratory Professional Search through Semantic Post-Analysis of Search Results , 2014, Professional Search in the Modern World.

[16]  Noriko Kando,et al.  Overview of the Patent Retrieval Task at the NTCIR-6 Workshop , 2007, NTCIR.

[17]  Wlodek Zadrozny,et al.  Innovation Analytics Using Mined Semantic Analysis , 2016, FLAIRS Conference.

[18]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[19]  Walid Magdy,et al.  Patent query reduction using pseudo relevance feedback , 2011, CIKM '11.

[20]  Walid Magdy,et al.  PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[21]  Carol Peters,et al.  CLEF 2010 conference on multilingual and multimodal information access evaluation , 2011, SIGF.

[22]  Allan Hanbury,et al.  An Evaluation of an Interactive Federated Patent Search System , 2014, IRFC.

[23]  Andreas Rauber,et al.  PatNet: A Lexical Database for the Patent Domain , 2015, ECIR.

[24]  Wlodek Zadrozny,et al.  A Visual Semantic Framework for Innovation Analytics , 2016, AAAI.

[25]  Andreas Rauber,et al.  Analyzing Query Logs of USPTO Examiners to Identify Useful Query Terms in Patent Documents for Query Expansion in Patent Searching: A Preliminary Study , 2012, IRFC.

[26]  Jungi Kim,et al.  Cluster-Based Patent Retrieval Using International Patent Classification System , 2006, ICCPOL.

[27]  Laurent Romary,et al.  Multiple Retrieval Models and Regression Models for Prior Art Search , 2009, CLEF.

[28]  Maarten de Rijke,et al.  A query model based on normalized log-likelihood , 2009, CIKM.

[29]  Xiangji Huang,et al.  TREC-CHEM: large scale chemical information retrieval evaluation at TREC , 2009, SIGF.

[30]  Allan Hanbury,et al.  CLEF-IP 2011: Retrieval in the Intellectual Property Domain , 2011, CLEF.

[31]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[32]  Walid Magdy,et al.  Simple vs. Sophisticated Approaches for Patent Prior-Art Search , 2011, ECIR.

[33]  Ronald J. Mann,et al.  A New Look at Patent Quality: Relating Patent Prosecution to Validity , 2012 .

[34]  Padmini Srinivasan,et al.  Comparison of IPC and USPC classification systems in patent prior art searches , 2010, PaIR '10.

[35]  John Tait,et al.  CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain , 2009, CLEF.

[36]  W. Scott Spangler,et al.  SIMPLE: A Strategic Information Mining Platform for Licensing and Execution , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[37]  Preben Hansen,et al.  Going beyond CLEF-IP: The 'Reality' for Patent Searchers? , 2012, CLEF.

[38]  Andreas Rauber,et al.  Improving Retrievability of Patents in Prior-Art Search , 2010, ECIR.

[39]  Carol Peters,et al.  Multilingual Information Access Evaluation I. Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers , 2010, CLEF.

[40]  Tam Harbert The Law Machine , 2013, IEEE Spectrum.

[41]  D. Harhoff,et al.  Citation Frequency and the Value of Patented Inventions , 1999, Review of Economics and Statistics.

[42]  Ralf Krestel,et al.  Recommending patents based on latent topics , 2013, RecSys.

[43]  Andreas Rauber,et al.  Using query logs of USPTO patent examiners for automatic query expansion in patent searching , 2014, Information Retrieval.

[44]  M. Trajtenberg A Penny for Your Quotes : Patent Citations and the Value of Innovations , 1990 .

[45]  Walid Magdy,et al.  Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval , 2009, CLEF.

[46]  John Tait,et al.  Current Challenges in Patent Information Retrieval , 2011, The Information Retrieval Series.

[47]  Mostafa Keikha,et al.  Building Queries for Prior-Art Search , 2011, IRFC.

[48]  Andreas Rauber,et al.  Acquiring Lexical Knowledge from Query Logs for Query Expansion in Patent Searching , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[49]  Yan Liu,et al.  Latent graphical models for quantifying and predicting patent quality , 2011, KDD.

[50]  W. Scott Spangler,et al.  COA: finding novel patents through text analysis , 2009, KDD.

[51]  Suzan Verberne,et al.  CLEF-IP 2010: Prior Art Retrieval Using the Different Sections in Patent Documents , 2010, CLEF.

[52]  Xiangji Huang,et al.  Overview of the TREC 2011 Chemical IR Track , 2009, TREC.

[53]  Fabio Crestani,et al.  Learning-Based Pseudo-Relevance Feedback for Patent Retrieval , 2012, IRFC.

[54]  Scott Sanner,et al.  On Term Selection Techniques for Patent Prior Art Search , 2015, SIGIR.

[55]  Patrick Ruch,et al.  Simple Pre and Post Processing Strategies for Patent Searching in CLEF Intellectual Property Track 2009 , 2009, CLEF.

[56]  Allan Hanbury,et al.  PerFedPat: An integrated federated system for patent search , 2014 .

[57]  Djoerd Hiemstra,et al.  Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics , 2012, Lecture Notes in Computer Science.

[58]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[59]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[60]  Mark A. Lemley,et al.  Our Divided Patent System , 2014 .

[61]  Pamela Forner,et al.  Information access evaluation : multilinguality, multimodality, and visualization : 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23-26, 2013 : proceedings , 2013 .

[62]  Yen-Liang Chen,et al.  An IPC-based vector space model for patent retrieval , 2011, Inf. Process. Manag..

[63]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[64]  Xin Jin,et al.  Patent Maintenance Recommendation with Patent Information Network Model , 2011, 2011 IEEE 11th International Conference on Data Mining.

[65]  Atsushi Fujii Enhancing patent retrieval by citation analysis , 2007, SIGIR.

[66]  Laurent Romary,et al.  Experiments with Citation Mining and Key-Term Extraction for Prior Art Search , 2010, CLEF.

[67]  Sung-Hyon Myaeng,et al.  Wikipedia-based query phrase expansion in patent class search , 2013, Information Retrieval.

[68]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[69]  Tomek Strzalkowski,et al.  Evaluating document retrieval in patent database: a preliminary report , 1997, CIKM '97.

[70]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[71]  ChengXiang Zhai,et al.  Adaptive relevance feedback in information retrieval , 2009, CIKM.

[72]  Vasudeva Varma,et al.  Patent search using IPC classification vectors , 2011, PaIR '11.

[73]  Fabio Crestani,et al.  The effect of citation analysis on query expansion for patent retrieval , 2013, Information Retrieval.

[74]  Noriko Kando,et al.  Overview of Patent Retrieval Task at NTCIR-5 , 2005, NTCIR.

[75]  Hongfang Liu,et al.  A common type system for clinical natural language processing , 2013, J. Biomed. Semant..

[76]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[77]  Jakub Wajda,et al.  Prior-Art Relevance Ranking Based on the Examiner's Query Log Content , 2016, Challenging Problems and Solutions in Intelligent Systems.

[78]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[79]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[80]  Fabio Crestani,et al.  Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval , 2013, SIGIR.

[81]  Peng Xu,et al.  Finding nuggets in IP portfolios: core patent mining through textual temporal analysis , 2012, CIKM '12.

[82]  Wlodek Zadrozny,et al.  Analytics in Post-Grant Patent Review: Possibilities and Challenges (Preliminary Report) , 2016 .

[83]  Walid Magdy,et al.  Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task , 2010, CLEF.

[84]  Scott Sanner,et al.  A study of query reformulation for patent prior art search with partial patent applications , 2015, ICAIL.

[85]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[86]  Wojciech Penczek,et al.  Challenging Problems and Solutions in Intelligent Systems , 2016, Challenging Problems and Solutions in Intelligent Systems.

[87]  Laurent Romary,et al.  PATATRAS: Retrieval Model Combination and Regression Models for Prior Art Search , 2009, CLEF.

[88]  Wlodek Zadrozny,et al.  Measuring Semantic Relatedness using Mined Semantic Analysis , 2015, ArXiv.

[89]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[90]  Walid Magdy,et al.  A study on query expansion methods for patent retrieval , 2011, PaIR '11.

[91]  Lanfen Lin,et al.  A semantic query expansion-based patent retrieval approach , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[92]  David L. Schwartz,et al.  Data Sources on Patents, Copyrights, Trademarks, and Other Intellectual Property , 2015, Research Handbook on the Economics of Intellectual Property Law.

[93]  Fabio Crestani,et al.  Query-Driven Mining of Citation Networks for Patent Citation Retrieval and Recommendation , 2014, CIKM.

[94]  Leif Azzopardi,et al.  A Methodology for Building a Patent Test Collection for Prior Art Search , 2008, EVIA@NTCIR.

[95]  Maura R. Grossman,et al.  Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review , 2011 .

[96]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[97]  Bronwyn H Hall,et al.  Market value and patent citations , 2005 .

[98]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[99]  W. Scott Spangler,et al.  SIMPLE: Interactive Analytics on Patent Data , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[100]  Suzan Verberne,et al.  Prior Art Retrieval Using the Claims Section as a Bag of Words , 2009, CLEF.

[101]  Ronald J. Mann,et al.  A New Look at Patent Quality: Relating Patent Prosecution to Validity , 2010 .

[102]  Fabio Crestani,et al.  Patent Query Formulation by Synthesizing Multiple Sources of Relevance Evidence , 2014, TOIS.

[103]  Andreas Rauber,et al.  Effect of Log-Based Query Term Expansion on Retrieval Effectiveness in Patent Searching , 2015, CLEF.