论文信息 - Recommendations as Treatments: Debiasing Learning and Evaluation

Recommendations as Treatments: Debiasing Learning and Evaluation

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handle selection biases by adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, and find that it is highly practical and scalable.

[1] Harald Steck,et al. Training and testing of recommender systems on data missing not at random , 2010, KDD.

[2] Mehryar Mohri,et al. Sample Selection Bias Correction Theory , 2008, ALT.

[3] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4] Steffen Bickel,et al. Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[5] Zoubin Ghahramani,et al. Probabilistic Matrix Factorization with Non-random Missing Data , 2014, ICML.

[6] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.

[7] D. McCaffrey,et al. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[8] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.

[9] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.

[10] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.

[11] Thorsten Joachims,et al. A support vector method for multivariate performance measures , 2005, ICML.

[12] Mingrui Wu,et al. Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.

[13] I. White,et al. Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[14] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[15] Harald Steck,et al. Evaluation of recommendations: rating-prediction and ranking , 2013, RecSys.

[16] Motoaki Kawanabe,et al. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[17] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[19] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[20] T. Shakespeare,et al. Observational Studies , 2003 .

[21] Nicole A. Lazar,et al. Statistical Analysis With Missing Data , 2003, Technometrics.

[22] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[23] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[24] Jorge Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[25] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .

[26] Richard S. Zemel,et al. Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[27] J. Wooldridge. Inverse probability weighted estimation for general missing data problems , 2004 .

[28] Richard S. Zemel,et al. Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[29] Tetsuya Sakai,et al. Alternatives to Bpref , 2007, SIGIR.

[30] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[31] Fabrice Rossi,et al. Reducing Offline Evaluation Bias in Recommendation Systems , 2014, ArXiv.

[32] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[33] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[34] Alexander J. Smola,et al. COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking , 2007, NIPS.

[35] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[36] Harald Steck,et al. Item popularity and recommendation accuracy , 2011, RecSys '11.

[37] Emine Yilmaz,et al. A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[38] Patrick Gallinari,et al. Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics , 2012, RecSys.

[39] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[40] Yifan Hu,et al. Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[41] T. Hesterberg,et al. Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[42] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[43] David M. Blei,et al. Modeling User Exposure in Recommendation , 2015, WWW.

[44] Yehuda Koren,et al. Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[45] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[46] Gert R. G. Lanckriet,et al. Top-N Recommendation with Missing Implicit Feedback , 2015, RecSys.

[47] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..