A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

To retrieve relevant documents from an enormous document collection, we usually utilize the similarity or distance measure between a query and the documents, or apply document clustering techniques to the document collection and partition it into relevant document groups. For patent retrieval, however, it is difficult to retrieve documents by using query terms only, because complex terminologies specific to patents appear in them. One approach to solving this problem is to use query expansion techniques. We have extended the usual vector space model by utilizing coclustering techniques. We generate a hierarchy of clusters by applying these techniques to the document collection with different levels of cluster granularity. The query is then expanded by using this hierarchy of clusters. We participated in the NTCIR-5 Patent Retrieval Task (Document Retrieval Subtask) using our system and present the effectiveness of our approach for patent retrieval with experiments using the NTCIR4 and NTCIR-5 test collections.