Recommendations as Treatments: Debiasing Learning and Evaluation

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handle selection biases by adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, and find that it is highly practical and scalable.

[1]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[2]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[5]  Zoubin Ghahramani,et al.  Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[6]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[7]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[8]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[9]  Emine Yilmaz,et al.  A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.

[10]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[11]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[12]  Mingrui Wu,et al.  Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[13]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[14]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[15]  Harald Steck,et al.  Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[16]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[17]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[19]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[20]  T. Shakespeare,et al.  Observational Studies , 2003 .

[21]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[22]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[23]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[24]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[25]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .

[26]  Richard S. Zemel,et al.  Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[27]  J. Wooldridge Inverse probability weighted estimation for general missing data problems , 2004 .

[28]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[29]  Tetsuya Sakai,et al.  Alternatives to Bpref , 2007, SIGIR.

[30]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[31]  Fabrice Rossi,et al.  Reducing Offline Evaluation Bias in Recommendation Systems , 2014, ArXiv.

[32]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[33]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[34]  Alexander J. Smola,et al.  COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking , 2007, NIPS.

[35]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[36]  Harald Steck,et al.  Item popularity and recommendation accuracy , 2011, RecSys '11.

[37]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[38]  Patrick Gallinari,et al.  Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics , 2012, RecSys.

[39]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[40]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[41]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[42]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[43]  David M. Blei,et al.  Modeling User Exposure in Recommendation , 2015, WWW.

[44]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[45]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[46]  Gert R. G. Lanckriet,et al.  Top-N Recommendation with Missing Implicit Feedback , 2015, RecSys.

[47]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..