Boosting Venue Page Rankings for Contextual Retrieval-Georgetown at TREC 2013 Contextual Suggestion Track

We participate in the closed collection sub-track of the TREC 2013 Contextual Suggestion. The dataset that we use is an integrated collection of ClueWeb12 Category B, Wikitravel, and the city-specific sub-collection; all are from ClueWeb12. Since the Open Web is not used in our submissions, the task is essentially a retrieval task instead of a result merging task. Our system takes users’ ratings of venues in a training city as inputs, and generates titles, document identification numbers, and descriptions for venues that fit users’ interests in a new city. Ideal relevant documents for this task should be a list of Web pages each of which is a venue’s homepage, which we call a “venue page”. However, off-the-shelf search tools, such as Lemur, fail to retrieve such venue homepages from the collection. They either retrieve non-relevant documents or “yellow-page”-like pages that link to a long list of venue pages where the links are often broken and the destination pages are out of the collection. Therefore, large portions of the retrieved documents are not suitable as answers for contextual suggestion. To address this challenge, we experiment two different approaches, a precision-oriented approach and a recall-oriented approach, to boost the relevant venue pages’ ranking.