Comparing click-through data to purchase decisions for retrieval evaluation

Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. Relevance assessments inferred from implicit feedback such as click-through data can be collected inexpensively, but may be less reliable. We compare assessments derived from click-through data to another source of implicit feedback that we assume to be highly indicative of relevance: purchase decisions. Evaluating retrieval runs based on a log of an audio-visual archive, we find agreement between system rankings and purchase decisions to be surprisingly high.