Query Generation Techniques for Patent Prior-Art Search in Multiple Languages

Patent prior-art search is an necessary step to ensure that no previous similar disclosures were made before granting an patent. The task is to identify all relevant information which may invalidate the originality of a claim of a patent application. Using the whole patent or extracting high indicative terms to form a query reduces the search burden on the user. To date, There are no large-scale experiments conducted specifically for evaluating query generation techniques used in patent prior-art search in multiple languages. In the following paper, we firstly introduced seven methods for generating patent queries for ranking. Then a large-scale experimental evaluation was carried out on the CLEF-IP 2009 multilingual dataset in English, French and German. A detail comparison of the different methods in terms of performance and efficiency has been performed in addition to the use of full-length documents as queries in the patent search. The results show that some methods, work well in information retrieval in general, fail to achieve the same effectiveness in the patent search. Different methods demonstrated distinct performance w.r.t query and document languages.

[1]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[2]  Helen Ashman,et al.  A Hybrid Technique for English-Chinese Cross Language Information Retrieval , 2008, TALIP.

[3]  Hideo Itoh,et al.  Term Distillation in Patent Retrieval , 2003, ACL 2003.

[4]  Mostafa Keikha,et al.  Automatic refinement of patent queries using concept importance predictors , 2012, SIGIR '12.

[5]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[6]  Fabio Crestani,et al.  Report on the CLEF-IP 2011 Experiments: Exploring Patent Summarization , 2011, CLEF.

[7]  Mostafa Keikha,et al.  Building Queries for Prior-Art Search , 2011, IRFC.

[8]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[9]  Maximilian Eibl,et al.  Does Patent IR Profit from Linguistics or Maximum Query Length? , 2011, CLEF.

[10]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[11]  Allan Hanbury,et al.  CLEF-IP 2011: Retrieval in the Intellectual Property Domain , 2011, CLEF.

[12]  Walid Magdy,et al.  Simple vs. Sophisticated Approaches for Patent Prior-Art Search , 2011, ECIR.

[13]  Walid Magdy,et al.  Patent query reduction using pseudo relevance feedback , 2011, CIKM '11.

[14]  Laurent Romary,et al.  Experiments with Citation Mining and Key-Term Extraction for Prior Art Search , 2010, CLEF.

[15]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[16]  Noriko Kando,et al.  Overview of Patent Retrieval Task at NTCIR-5 , 2005, NTCIR.

[17]  John Tait,et al.  CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain , 2009, CLEF.

[18]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[19]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[20]  Walid Magdy,et al.  Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task , 2010, CLEF.