An exploration of pattern-based subtopic modeling for search result diversification

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result diversification is to return search results that not only are relevant to the query but also cover different subtopics. Therefore, the subtopic modeling is an important research topic in search result diversification. In this paper, we propose a novel pattern based method to extract subtopics from retrieved documents. The basic idea is to explicitly model a query subtopic as a semantically meaningful text unit in relevant documents. We apply a frequent pattern mining algorithm to efficiently extract these text units (patterns) from retrieved documents. We then model a query subtopic with a single pattern and rank subtopics based on their similarity with the query. These pattern based subtopics are then used to diversify search results.