Proposal of two-stage patent retrieval method considering the claim structure

The importance of patents is increasing in global society. In preparing a patent application, it is essential to search for related patents that may invalidate the invention. However, it is time-consuming to identify them among the millions of patents. This article proposes a patent-retrieval method that considers a claim structure for a more accurate search for invalidity. This method uses a claim text as input; it consists of two retrieval stages. In stage 1, general text analysis and retrieval methods are applied to improve recall. In stage 2, the top N documents retrieved in stage 1 are rearranged to improve precision by applying text analysis and retrieval methods using the claim structure. Our two-stage retrieval introduces five precision-oriented analysis and retrieval methods: query-term extraction from a portion of a claim text that describes the characteristics of a claim; query term-weighting without term frequency; query term-weighting with “measurement terms”; text retrieval using only claims as a target; and calculating the relevant score by “partially” adding scores in stage 2 to those in stage 1. Evaluation results using test sets of the NTCIR4 Patent Retrieval Task show that our methods are effective, though the degree of the effectiveness varies depending on the test sets.