Exploring Keyphrase Extraction and IPC Classification Vectors for Prior Art Search

In this paper we describe experiments conducted for CLEF- IP 2011 Prior Art Retrieval track. We examined the impact of 1) us- ing key phrase extraction to generate queries from input patent and 2) the use of citation network and (International Patent Classification) IPC class vector in ranking patents. Variations of a popular key phrase extrac- tion technique were explored for extracting and scoring terms of query patent. These terms are used as queries to retrieve similar patents. In the second approach, we use a two stage retrieval model to find similar patents. Each patent is represented as an IPC class vector. Citation net- work of patents is used to propagate these vectors from a node (patent) to its neighbors (cited patents). Similar patents are found by comparing query vector with vectors of patents in the corpus. Text based search is used to re-rank this solution set to improve precision. Two-stage sys- tem is used to retrieve and rank patents. Finally, we also extract and add citations present within the text of a query patent to the result set. Adding these citations (present in query patent text) to the results shows significant improvement in Mean Average Precision (MAP).