More Efficient Policy Learning via Optimal Retargeting

Abstract Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different actions, which can lead to unwieldy policy evaluation and poorly performing learned policies. We study a solution to this problem based on retargeting, that is, changing the population on which policies are optimized. We first argue that at the population level, retargeting may induce little to no bias. We then characterize the optimal reference policy and retargeting weights in both binary-action and multi-action settings. We do this in terms of the asymptotic efficient estimation variance of the new learning objective. We further consider weights that additionally control for potential bias due to retargeting. Extensive empirical results in a simulation study and a case study of personalized job counseling demonstrate that retargeting is a fairly easy way to significantly improve any policy learning procedure applied to observational data. Supplementary materials for this article are available online.

[1]  W. G. Cochran,et al.  Controlling Bias in Observational Studies: A Review. , 1974 .

[2]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[3]  D. Basu Randomization Analysis of Experimental Data: The Fisher Randomization Test , 1980 .

[4]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[5]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[6]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[7]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[8]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[9]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[10]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[11]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[12]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[15]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[16]  Richard K. Crump,et al.  Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand , 2006, SSRN Electronic Journal.

[17]  K. Hirano,et al.  Asymptotics for Statistical Treatment Rules , 2009 .

[18]  E. Ionides Truncated Importance Sampling , 2008 .

[19]  Jörg Stoye,et al.  Minimax regret treatment choice with finite samples , 2009 .

[20]  G. King,et al.  Multivariate Matching Methods That Are Monotonic Imbalance Bounding , 2011 .

[21]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[22]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[23]  Donald B Rubin,et al.  On the limitations of comparative effectiveness research , 2010, Statistics in medicine.

[24]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[25]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[26]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[27]  Evarist Giné,et al.  Empirical Processes , 2011, International Encyclopedia of Statistical Science.

[28]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[29]  K. Cregan Private and Public , 2012 .

[30]  M. Gurgand,et al.  Private and Public Provision of Counseling to Job-Seekers: Evidence from a Large Controlled Experiment , 2014, SSRN Electronic Journal.

[31]  Sergey Levine,et al.  Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[32]  Eric B. Laber,et al.  Dynamic treatment regimes: Technical challenges and applications , 2014 .

[33]  Thorsten Joachims,et al.  Counterfactual Risk Minimization , 2015, ICML.

[34]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[35]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[36]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[37]  Zhuang Fengqing,et al.  Patients’ Responsibilities in Medical Ethics , 2016 .

[38]  Ying Daisy Zhuo,et al.  Personalized Diabetes Management Using Electronic Medical Records , 2016, Diabetes Care.

[39]  Donglin Zeng,et al.  Personalized Dose Finding Using Outcome Weighted Learning , 2016, Journal of the American Statistical Association.

[40]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[41]  Nathan Kallus,et al.  Recursive Partitioning for Personalization using Observational Data , 2016, ICML.

[42]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[43]  Nathan Kallus,et al.  Policy Evaluation and Optimization with Continuous Treatments , 2018, AISTATS.

[44]  Michele Santacatterina,et al.  Optimal Probability Weights for Inference With Constrained Precision , 2018, Journal of the American Statistical Association.

[45]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[46]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[47]  Kari Lock Morgan,et al.  Balancing Covariates via Propensity Score Weighting , 2014, 1609.07494.

[48]  Zhengyuan Zhou,et al.  Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..

[49]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[50]  Nathan Kallus,et al.  Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds , 2019, ArXiv.

[51]  Sanmay Das,et al.  Allocating Interventions Based on Predicted Outcomes: A Case Study on Homelessness Services , 2019, AAAI.

[52]  Eric B. Laber,et al.  Precision Medicine. , 2019, Annual review of statistics and its application.

[53]  Donglin Zeng,et al.  Robustifying Trial-Derived Optimal Treatment Rules for A Target Population. , 2019, Electronic journal of statistics.

[54]  Nathan Kallus,et al.  Generalized Optimal Matching Methods for Causal Inference , 2016, J. Mach. Learn. Res..

[55]  Nathan Kallus,et al.  Efficient Policy Learning from Surrogate-Loss Classification Reductions , 2020, ICML.