Generalizing Off-Policy Learning under Sample Selection Bias

Learning personalized decision policies that generalize to the target population is of great relevance. Since training data is often not representative of the target population, standard policy learning methods may yield policies that do not generalize target population. To address this challenge, we propose a novel framework for learning policies that generalize to the target population. For this, we characterize the difference between the training data and the target population as a sample selection bias using a selection variable. Over an uncertainty set around this selection variable, we optimize the minimax value of a policy to achieve the best worst-case policy value on the target population. In order to solve the minimax problem, we derive an efficient algorithm based on a convex-concave procedure and prove convergence for parametrized spaces of policies such as logistic policies. We prove that, if the uncertainty set is well-specified, our policies generalize to the target population as they can not do worse than on the training data. Using simulated data and a clinical trial, we demonstrate that, compared to standard policy learning methods, our framework improves the generalizability of policies substantially.

[1]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[2]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[3]  N. Black,et al.  The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. , 1998, Journal of epidemiology and community health.

[4]  Stefan Feuerriegel,et al.  Estimating Average Treatment Effects via Orthogonal Regularization , 2021, CIKM.

[5]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[6]  R. Greenblatt Priority issues concerning HIV infection among women. , 2011, Women's health issues : official publication of the Jacobs Institute of Women's Health.

[7]  S. Norris,et al.  Effectiveness of self-management training in type 2 diabetes: a systematic review of randomized controlled trials. , 2001, Diabetes care.

[8]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[9]  W. Frontera,et al.  Assessment of the Inclusion of Racial/Ethnic Minority, Female, and Older Individuals in Vaccine Clinical Trials , 2021, JAMA network open.

[10]  Stefan Feuerriegel,et al.  Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process Model , 2020, WWW.

[11]  Le Thi Hoai An,et al.  Accelerated Difference of Convex functions Algorithm and its Application to Sparse Binary Logistic Regression , 2018, IJCAI.

[12]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[13]  Donglin Zeng,et al.  Robustifying Trial-Derived Optimal Treatment Rules for A Target Population. , 2019, Electronic journal of statistics.

[14]  S. Zionts,et al.  Programming with linear fractional functionals , 1968 .

[15]  Zhengyuan Zhou,et al.  Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits , 2020, ICML.

[16]  Stefan Feuerriegel,et al.  AttDMM: An Attentive Deep Markov Model for Risk Scoring in Intensive Care Units , 2021, KDD.

[17]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[18]  Yufeng Liu,et al.  Learning Optimal Distributionally Robust Individualized Treatment Rules , 2020, Journal of the American Statistical Association.

[19]  Jennifer L. Hill,et al.  Assessing lack of common support in causal inference using bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children's cognitive outcomes , 2013, 1311.7244.

[20]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[21]  Michael G Hudgens,et al.  Generalizing evidence from randomized trials using inverse probability of sampling weights , 2018, Journal of the Royal Statistical Society. Series A,.

[22]  R. Horst,et al.  DC Programming: Overview , 1999 .

[23]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[24]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[25]  Stefan Feuerriegel,et al.  Sequential Deconfounding for Causal Inference with Unobserved Confounders , 2021, ArXiv.

[26]  R. Greenblatt,et al.  Eligibility criteria for HIV clinical trials and generalizability of results: the gap between published reports and study protocols , 2005, AIDS.

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  Stefan Wager,et al.  Policy Learning With Observational Data , 2017, Econometrica.

[29]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[30]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[31]  Andreas Maurer,et al.  The Rademacher Complexity of Linear Transformation Classes , 2006, COLT.

[32]  Xiaojie Mao,et al.  Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding , 2018, AISTATS.

[33]  Donald K. K. Lee,et al.  Interval estimation of population means under unknown but bounded probabilities of sample selection , 2013 .

[34]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[35]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[36]  C. Manski Anatomy of the Selection Problem , 1989 .

[37]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[38]  Dylan S. Small,et al.  Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[39]  S. Cole,et al.  Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. , 2010, American journal of epidemiology.

[40]  Nathan Kallus,et al.  Efficient Policy Learning from Surrogate-Loss Classification Reductions , 2020, ICML.

[41]  E. Oster,et al.  Weighting for External Validity , 2017 .

[42]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[43]  Sarah E. Robertson,et al.  Generalizing causal inferences from individuals in randomized trials to all trial‐eligible individuals , 2017, Biometrics.

[44]  Stefan Wager,et al.  Shape-constrained partial identification of a population mean under unknown probabilities of sample selection , 2017, 1706.07550.

[45]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[46]  J. Heckman Sample selection bias as a specification error , 1979 .

[47]  Paul R. Rosenbaum,et al.  Overt Bias in Observational Studies , 2002 .

[48]  Masahiro Kato,et al.  Off-Policy Evaluation and Learning for External Validity under a Covariate Shift , 2020, NeurIPS.

[49]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[50]  Stefan Feuerriegel,et al.  Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies , 2021, ML4H@NeurIPS.

[51]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[52]  S. Hammer,et al.  A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. AIDS Clinical Trials Group Study 175 Study Team. , 1996, The New England journal of medicine.

[53]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[54]  Sanmay Das,et al.  Allocating Interventions Based on Predicted Outcomes: A Case Study on Homelessness Services , 2019, AAAI.

[55]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.