Response prediction using collaborative filtering with hierarchies and side-information

In online advertising, response prediction is the problem of estimating the probability that an advertisement is clicked when displayed on a content publisher's webpage. In this paper, we show how response prediction can be viewed as a problem of matrix completion, and propose to solve it using matrix factorization techniques from collaborative filtering (CF). We point out the two crucial differences between standard CF problems and response prediction, namely the requirement of predicting probabilities rather than scores, and the issue of confidence in matrix entries. We address these issues using a matrix factorization analogue of logistic regression, and by applying a principled confidence-weighting scheme to its objective. We show how this factorization can be seamlessly combined with explicit features or side-information for pages and ads, which let us combine the benefits of both approaches. Finally, we combat the extreme sparsity of response prediction data by incorporating hierarchical information about the pages and ads into our factorization model. Experiments on three very large real-world datasets show that our model outperforms current state-of-the-art methods for response prediction.

[1]  Rajiv Khanna,et al.  Estimating rates of rare events with multiple hierarchies through scalable log-linear models , 2010, KDD '10.

[2]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[4]  Somnath Banerjee,et al.  Collaborative filtering on skewed datasets , 2008, WWW.

[5]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[6]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[7]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[8]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[9]  Charles Elkan,et al.  A Log-Linear Model with Latent Features for Dyadic Prediction , 2010, 2010 IEEE International Conference on Data Mining.

[10]  G. Grudic,et al.  Loss Functions for Binary Class Probability Estimation , 2003 .

[11]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[12]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[13]  Georg Lausen,et al.  On exploiting classification taxonomies in recommender systems , 2008, AI Commun..

[14]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[15]  Domonkos Tikk,et al.  Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[16]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[17]  Lars Schmidt-Thieme,et al.  Taxonomy-driven computation of product recommendations , 2004, CIKM '04.

[18]  Yury Lifshits,et al.  Estimation of the Click Volume by Large Scale Regression Analysis , 2007, CSR.

[19]  Andrei Z. Broder,et al.  Estimating rates of rare events at multiple resolutions , 2007, KDD '07.

[20]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[21]  Alexander J. Smola,et al.  Like like alike: joint friendship and interest propagation in social networks , 2011, WWW.

[22]  Deepak Agarwal,et al.  Spatio-temporal models for estimating click-through rate , 2009, WWW '09.