Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

The TREC 2013 Contextual Suggestion Track allowed participants to submit personalised rankings using documents either from the OpenWeb or from an archived, static Web collection, the ClueWeb12 dataset. We argue that this setting poses problems in how the performance of the participants should be compared. We analyse biases found in the process, both objective and subjective, and discuss these issues in the general framework of evaluating personalised Information Retrieval using dynamic against static datasets.