High Recall-Low Cost Model for Patent Retrieval

Patenting has been considered as a key enabler for many information-centric companies and organizations. The higher the required patent capability, the more important the need for an effective and efficient patent retrieval system. Many conventional patent retrieval systems have produced unsatisfactory results for the patent queries because the inherent systems have come from traditional keyword based models, which inevitably have resulted too many unrelated items in the search outcomes. Consequently, these systems have required the patent experts lots of time to refine iterative search results manually. In this paper, we propose a specialized patent-searching method where the keyword vectors in each and every document and their implication for each patent vectors are investigated. With the elaborated vector finding algorithm and the ranking capability, the documents for valid patents are placed in higher ranks and those for noise patents are placed in sub-ranked positions. As a benefit, our method can find the target documents efficiently so that the noise data in return can significantly be eliminated from the results. Hence, our method can be verified by the real data sets as a de facto standard method for the recall-oriented patent retrieval. Experimental results with real-life datasets show that our method outperformed many conventional patent retrieval systems with respect to time and cost.