Off-line Evaluation of Recommendation Functions

This paper proposes a novel method for assessing the performance of any Web recommendation function (ie user model), M, used in a Web recommender sytem, based on an off-line computation using labeled session data. Each labeled session consists of a sequence of Web pages followed by a page p$^{\rm ({\it IC})}$ that contains information the user claims is relevant. We then apply M to produce a corresponding suggested page p$^{\rm ({\it S})}$. In general, we say that M is good if p$^{\rm ({\it S})}$ has content “similar” to the associated p$^{\rm ({\it IC})}$, based on the the same session. This paper defines a number of functions for estimating this p$^{\rm ({\it S})}$ to p$^{\rm ({\it IC})}$ similarity that can be used to evaluate any new models off-line, and provides empirical data to demonstrate that evaluations based on these similarity functions match our intuitions.