Recommendations as Treatments

In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a desired effect. This interventional view has led to the development of counterfactual inference techniques for evaluating and optimizing recommendation policies. This article explains how these techniques enable unbiased offline evaluation and learning despite biased data, and how they can inform considerations of fairness and equity in recommender systems.

[1]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[2]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[3]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[4]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[5]  Dietmar Jannach,et al.  Evaluation of session-based recommendation algorithms , 2018, User Modeling and User-Adapted Interaction.

[6]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[7]  Thorsten Joachims,et al.  Estimating Position Bias without Intrusive Interventions , 2018, WSDM.

[8]  Hao Ding,et al.  Temporal-Contextual Recommendation in Real-Time , 2020, KDD.

[9]  Ed H. Chi,et al.  Off-policy Learning in Two-stage Recommender Systems , 2020, WWW.

[10]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[11]  Jan Vondrák,et al.  Optimal approximation for the submodular welfare problem in the value oracle model , 2008, STOC.

[12]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[13]  Nathan Kallus,et al.  More Efficient Policy Learning via Optimal Retargeting , 2019, Journal of the American Statistical Association.

[14]  Thorsten Joachims,et al.  Policy Learning for Fairness in Ranking , 2019, NeurIPS.

[15]  Thorsten Joachims,et al.  Controlling Fairness and Bias in Dynamic Learning-to-Rank , 2020, SIGIR.

[16]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[17]  Ricardo Baeza-Yates,et al.  FA*IR: A Fair Top-k Ranking Algorithm , 2017, CIKM.

[18]  Elena Smirnova,et al.  Distributionally Robust Counterfactual Risk Minimization , 2019, AAAI.

[19]  May D. Wang,et al.  Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization , 2018, ICML.

[20]  Amin S. Sayedi-Roshkhar,et al.  Learning in Online Advertising , 2019, Mark. Sci..

[21]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[22]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[23]  Thorsten Joachims,et al.  User Fairness, Item Fairness, and Diversity for Rankings in Two-Sided Markets , 2020, ICTIR.

[24]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[25]  J. Kleinberg,et al.  Roles for computing in social change , 2019, FAT*.

[26]  Krishna P. Gummadi,et al.  Equity of Attention: Amortizing Individual Fairness in Rankings , 2018, SIGIR.

[27]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[28]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[29]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[30]  Nikos Vlassis,et al.  Marginal Posterior Sampling for Slate Bandits , 2019, IJCAI.

[31]  Dietmar Jannach,et al.  Multistakeholder recommendation: Survey and research directions , 2020, User Modeling and User-Adapted Interaction.

[32]  Yi Su,et al.  Doubly robust off-policy evaluation with shrinkage , 2019, ICML.

[33]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[34]  Ben Carterette,et al.  Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions , 2020, KDD.

[35]  Fernando Diaz,et al.  Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems , 2018, CIKM.

[36]  Minmin Chen,et al.  Surrogate Objectives for Batch Policy Optimization in One-step Decision Making , 2019, NeurIPS.

[37]  Bert Huang,et al.  Beyond Parity: Fairness Objectives for Collaborative Filtering , 2017, NIPS.

[38]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39]  Nikos Vlassis,et al.  On the Design of Estimators for Bandit Off-Policy Evaluation , 2019, ICML.

[40]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[41]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[42]  Nikos Vlassis,et al.  More Efficient Off-Policy Evaluation through Regularized Targeted Learning , 2019, ICML.

[43]  John Langford,et al.  Making Contextual Decisions with Low Technical Debt , 2016 .

[44]  Yi Su,et al.  CAB: Continuous Adaptive Blending for Policy Evaluation and Learning , 2019, ICML.

[45]  Claudio Gentile,et al.  Boltzmann Exploration Done Right , 2017, NIPS.

[46]  Thorsten Joachims,et al.  Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.

[47]  Yu-Xiang Wang,et al.  Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.

[48]  Ed H. Chi,et al.  Fairness in Recommendation Ranking through Pairwise Comparisons , 2019, KDD.

[49]  M. de Rijke,et al.  Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[50]  Dietmar Jannach,et al.  Research directions in session-based and sequential recommendation , 2020, User Modeling and User-Adapted Interaction.

[51]  E. Ionides Truncated Importance Sampling , 2008 .

[52]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[53]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[54]  Yisong Yue,et al.  Triply Robust Off-Policy Evaluation , 2019, ArXiv.

[55]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[56]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[57]  Marc Najork,et al.  Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.

[58]  Thorsten Joachims,et al.  Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers , 2017, KDD.

[59]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[60]  Risto Miikkulainen,et al.  GRADE: Machine Learning Support for Graduate Admissions , 2013, AI Mag..

[61]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[62]  Yiqun Liu,et al.  Fairness-Aware Group Recommendation with Pareto-Efficiency , 2017, RecSys.