The impact of intent selection on diversified search evaluation

To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, subtopics are manually developed at NIST, based on results of automatic click log analysis; in the NTCIR INTENT Task, intents are determined by manually clustering 'subtopics strings' returned by participating systems. In this study, we address the following research question: Does the choice of intents for a test collection affect relative performances of diversified search systems? To this end, we use the TREC 2012 Web Track Diversity Task data and the NTCIR-10 INTENT-2 Task data, which share a set of 50 topics but have different intent sets. Our initial results suggest that the choice of intents may affect relative performances, and that this choice may be far more important than how many intents are selected for each topic