Report on the CLEF-IP 2013 Experiments: Multilayer Collection Selection on Topically Organized Patents

This technical report presents the work which has been carried out using Distributed Information Retrieval methods for federated search of patent documents for the passage retrieval starting from claims (patentability or novelty search) task. Patent documents produced worldwide have manually-assigned classification codes which in our work are used to cluster, distribute and index patents through hundreds or thousands of sub-collections. For source selection, we tested CORI and a new col- lection selection method, the Multilayer method. We also tested CORI and SSL re- sults merging algorithms. We run experiments using different combinations of the number of collections requested and documents retrieved from each collection. One of the aims of the experiments was to test older DIR methods that characterize differ- ent collections using collection statistics like term frequencies and how they perform in patent search and in suggesting relevant collections. Also to experiment with Multi- layer, a new collection selection method that follows a multilayer, multi-evidence process to suggest collections taking advantage of the special hierarchical classifica- tion of patent documents. We submitted 8 runs. According to PRES @100 our best DIR approach ranked 6th across 21 submitted results.

[1]  Norbert Fuhr,et al.  Evaluating different methods of estimating retrieval quality for resource selection , 2003, SIGIR.

[2]  Georgios Paltoglou,et al.  A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase , 2008, Inf. Process. Manag..

[3]  Georgios Paltoglou,et al.  Modeling information sources as integrals for effective and efficient source selection , 2011, Inf. Process. Manag..

[4]  Georgios Paltoglou,et al.  Multilayer Collection Selection and Search of Topically Organized Patents , 2013 .

[5]  Donald H. Kraft,et al.  Advances in Information Retrieval: Where Is That /#*&@¢ Record? , 1985, Adv. Comput..

[6]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[7]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[8]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[9]  John Tait,et al.  Current Challenges in Patent Information Retrieval , 2011, The Information Retrieval Series.

[10]  James P. Callan,et al.  Collection selection and results merging with topically organized U.S. patents and TREC data , 2000, CIKM '00.

[11]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[12]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[13]  Stephen Adams,et al.  The text, the full text and nothing but the text: Part 1 – Standards for creating textual information in patent documents and general search implications ☆ , 2010 .

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[16]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[17]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[18]  Norbert FuhrMarc The optimum clustering framework: implementing the cluster hypothesis , 2012 .

[19]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[20]  Gabriella Kazai,et al.  Advances in Information Retrieval , 2015, Lecture Notes in Computer Science.

[21]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[22]  Ray R. Larson Distributed IR for Digital Libraries , 2003, ECDL.

[23]  Yen-Liang Chen,et al.  An IPC-based vector space model for patent retrieval , 2011, Inf. Process. Manag..

[24]  Georgios Paltoglou,et al.  Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents , 2012, CLEF.