论文信息 - Generalizing Off-Policy Learning under Sample Selection Bias

Generalizing Off-Policy Learning under Sample Selection Bias

Learning personalized decision policies that generalize to the target population is of great relevance. Since training data is often not representative of the target population, standard policy learning methods may yield policies that do not generalize target population. To address this challenge, we propose a novel framework for learning policies that generalize to the target population. For this, we characterize the difference between the training data and the target population as a sample selection bias using a selection variable. Over an uncertainty set around this selection variable, we optimize the minimax value of a policy to achieve the best worst-case policy value on the target population. In order to solve the minimax problem, we derive an efficient algorithm based on a convex-concave procedure and prove convergence for parametrized spaces of policies such as logistic policies. We prove that, if the uncertainty set is well-specified, our policies generalize to the target population as they can not do worse than on the training data. Using simulated data and a clinical trial, we demonstrate that, compared to standard policy learning methods, our framework improves the generalizability of policies substantially.

[1] Alan L. Yuille,et al. The Concave-Convex Procedure , 2003, Neural Computation.

[2] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[3] N. Black,et al. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. , 1998, Journal of epidemiology and community health.

[4] Stefan Feuerriegel,et al. Estimating Average Treatment Effects via Orthogonal Regularization , 2021, CIKM.

[5] Mehryar Mohri,et al. Sample Selection Bias Correction Theory , 2008, ALT.

[6] R. Greenblatt. Priority issues concerning HIV infection among women. , 2011, Women's health issues : official publication of the Jacobs Institute of Women's Health.

[7] S. Norris,et al. Effectiveness of self-management training in type 2 diabetes: a systematic review of randomized controlled trials. , 2001, Diabetes care.

[8] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[9] W. Frontera,et al. Assessment of the Inclusion of Racial/Ethnic Minority, Female, and Older Individuals in Vaccine Clinical Trials , 2021, JAMA network open.

[10] Stefan Feuerriegel,et al. Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process Model , 2020, WWW.

[11] Le Thi Hoai An,et al. Accelerated Difference of Convex functions Algorithm and its Application to Sparse Binary Logistic Regression , 2018, IJCAI.

[12] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[13] Donglin Zeng,et al. Robustifying Trial-Derived Optimal Treatment Rules for A Target Population. , 2019, Electronic journal of statistics.

[14] S. Zionts,et al. Programming with linear fractional functionals , 1968 .

[15] Zhengyuan Zhou,et al. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits , 2020, ICML.

[16] Stefan Feuerriegel,et al. AttDMM: An Attentive Deep Markov Model for Risk Scoring in Intensive Care Units , 2021, KDD.

[17] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[18] Yufeng Liu,et al. Learning Optimal Distributionally Robust Individualized Treatment Rules , 2020, Journal of the American Statistical Association.