NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

express their information needs in terms of queries to find the relevant documents on the web. However, users' queries are usually short, so that search engines may not have enough information to determine their exact intents. How to diversify web search results to cover users' possible intents as wide as possible is an important research issue. In this paper, we will propose several subtopic mining approaches and show how to diversify the search results by the mined subtopics. For Subtopic Mining subtask, we explore various subtopic mining algorithms that mine subtopics of a query from enormous documents on the web. For Document Ranking subtask, we propose re-ranking algorithms that keep the top-ranked results to contain as many popular subtopics as possible. The re-ranking algorithms apply sub-topics mined from subtopic mining algorithms to diversify the search results. The best performance of our system achieves an I- rec@10 (Intent Recall) of 0.4683, a D-nDCG@10 of 0.6546 and a D#-nDCG@10 of 0.5615 on Chinese Subtopic Mining subtask of NTCIR-9 Intent task and an I-rec@10 of 0.6180, a D-nDCG@10 of 0.3314 and a D#-nDCG@10 of 0.4747 on Chinese Document Ranking subtask of NTCIR-9 Intent task. Besides, the best performance of our system achieves an I-rec@10 of 0.4442, a D- nDCG@10 of 0.4244 and a D#-nDCG@10 of 0.4343 on Japanese Subtopic Mining subtask of NTCIR-9 Intent task and an I-rec@10 of 0.5975, a D-nDCG@10 of 0.2953 and a D#-nDCG@10 of 0.4464 on Japanese Document Ranking subtask.