Federated Patent Search

Federated search, also known as distributed information retrieval (DIR), is a technique for searching multiple text collections simultaneously. This chapter presents the basic components of a typical federated search system and the main technical challenges in each component during its operation. We briefly review the methods and techniques of federated search and how these can be applied in the patent domain. We discuss the problems that usually are ignored in DIR research, but they should be practically addressed in real federated patent search systems. We also present PerFedPat, an interactive patent search system based on the federated search approach. PerFedPat provides core services to search, using a federated method, multiple online patent resources, thus providing parallel access to multiple patent sources. PerFedPat hides complexity from the end user who uses a common single query tool for querying all patent datasets at the same time. The second innovative feature of PerFedPat is that it has a pluggable and extensible architecture, and therefore it enables the use of multiple search tools that are integrated in PerFedPat. We present an example of such a tool, the IPC suggestion tool, which uses a federated search technique (specifically source selection) that exploits topically organised patents (using their intellectually assigned classifications codes) to support patent searches by automated IPC suggestion. This tool shows how DIR techniques can be applied beyond the typical scenario of implementing a federated search system.

[1]  Fabio Crestani,et al.  Distributed Information Retrieval and Applications , 2013, ECIR.

[2]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[3]  David Hawking,et al.  Methods for information server selection , 1999, TOIS.

[4]  Michail Salampasis,et al.  A User-Centered Evaluation of a Web Based Patent Classification Tool , 2014, MindTheGap@iConference.

[5]  Ricardo A. Baeza-Yates,et al.  Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Luo Si,et al.  Effective query generation and postprocessing strategies for prior art patent search , 2012, J. Assoc. Inf. Sci. Technol..

[7]  Marcia J. Bates,et al.  Idea tactics , 1980, IEEE Transactions on Professional Communication.

[8]  Allan Hanbury,et al.  A Generalized Framework for Integrated Professional Search Systems , 2013, IRFC.

[9]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[10]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[11]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[12]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[13]  Fredric C. Gey,et al.  Experiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop , 2002, NTCIR.

[14]  Georgios Paltoglou,et al.  Results Merging Algorithm Using Multiple Regression Models , 2007, ECIR.

[15]  Allan Hanbury,et al.  Patent Retrieval , 2013, Found. Trends Inf. Retr..

[16]  Yannis Tzitzikas,et al.  Web Searching with Entity Mining at Query Time , 2012, IRFC.

[17]  Jie Lu,et al.  Full-text federated search of text-based digital libraries in peer-to-peer networks , 2006, Information Retrieval.

[18]  Fabio Crestani,et al.  Qualitative , and Quantitative Analyses of Small-Document Approaches to Resource Selection , 2014 .

[19]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[20]  Luo Si,et al.  An effective and efficient results merging strategy for multilingual information retrieval in federated search environments , 2007, Information Retrieval.

[21]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[22]  Avi Arampatzis,et al.  On CORI Results Merging , 2013, ECIR.

[23]  Djoerd Hiemstra,et al.  Size estimation of non-cooperative data collections , 2012, IIWAS '12.

[24]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[25]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[26]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[27]  Allan Hanbury,et al.  PerFedPat: An integrated federated system for patent search , 2014 .

[28]  Allan Hanbury,et al.  Integrating IR Technologies for Professional Search - (Full-Day Workshop) , 2013, ECIR.

[29]  Luo Si,et al.  The FedLemur project: Federated search in the real world , 2006 .

[30]  Dietmar Dirnberger A guide to efficient keyword, sequence and classification search strategies for biopharmaceutical drug-centric patent landscape searches - A human recombinant insulin patent landscape case study , 2011 .

[31]  W. G. Vijvers The international patent classification as a search tool , 1990 .

[32]  Georgios Paltoglou,et al.  Simple Adaptations of Data Fusion Algorithms for Source Selection , 2009, ECIR.

[33]  Georgios Paltoglou,et al.  Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents , 2012, CLEF.

[34]  Padmini Srinivasan,et al.  Using Classification Code Hierarchies for Patent Prior Art Searches , 2011, Current Challenges in Patent Information Retrieval.

[35]  Yen-Liang Chen,et al.  An IPC-based vector space model for patent retrieval , 2011, Inf. Process. Manag..

[36]  Stephen Adams Using the International Patent Classification in an online environment , 2000 .

[37]  Paul Thomas,et al.  To what problem is distributed information retrieval the solution? , 2012, J. Assoc. Inf. Sci. Technol..

[38]  Pertti Vakkari,et al.  Changes in relevance criteria and problem stages in task performance , 2000, J. Documentation.

[39]  James C. French,et al.  Comparing the performance of collection selection algorithms , 2003, TOIS.

[40]  Vagelis Hristidis,et al.  Patentssearcher: a novel portal to search and explore patents , 2010, PaIR '10.

[41]  Piotr Masiakowski,et al.  Integration of software tools in patent analysis , 2013 .

[42]  Michael R. Genesereth,et al.  Software agents , 1994, CACM.

[43]  Norbert Fuhr An infrastructure for supporting the evaluation of interactive information retrieval , 2011, DESIRE '11.

[44]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[45]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[46]  Roi Blanco,et al.  Assigning documents to master sites in distributed search , 2011, CIKM '11.

[47]  Marc Krier,et al.  Automatic categorisation applications at the European patent office , 2002 .

[48]  Marcia J. Bates,et al.  Information search tactics , 1979, J. Am. Soc. Inf. Sci..

[49]  W. Bruce Croft,et al.  Searching Distributed Collections With Inference Networks , 2017, SIGF.

[50]  Kristin Whitman Intellogist: An online community dedicated to comparing major patent search systems , 2011 .

[51]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[52]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[53]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[54]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[55]  John Tait,et al.  Current Challenges in Patent Information Retrieval , 2011, The Information Retrieval Series.

[56]  Bernd Wolter It takes all kinds to make a world – Some thoughts on the use of classification in patent searching , 2012 .

[57]  Luis Gravano,et al.  QProber: A system for automatic classification of hidden-Web databases , 2003, TOIS.

[58]  Ricardo Oltra-Garcia,et al.  Efficient situation specific and adaptive search strategies: Training material for new patent searchers , 2012 .

[59]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[60]  Sen Zhang,et al.  Suffix Array Construction in External Memory Using D-Critical Substrings , 2014, TOIS.

[61]  Wolfgang G. Stock,et al.  Intellectual property information: A comparative analysis of main information providers , 2006 .