Policy Learning With Observational Data

In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy.

[1]  A. Chambaz,et al.  Performance Guarantees for Policy Learning. , 2020, Annales de l'I.H.P. Probabilites et statistiques.

[2]  Zhengyuan Zhou,et al.  policytree: Policy learning via doubly robust empirical welfare maximization over trees , 2020, J. Open Source Softw..

[3]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[4]  G. K. Golubev,et al.  On Adaptive Estimation of Linear Functionals from Observations against White Noise , 2020, Problems of Information Transmission.

[5]  Xinkun Nie,et al.  Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.

[6]  S. Athey,et al.  Estimating Treatment Effects with Causal Forests: An Application , 2019, Observational Studies.

[7]  Bryan S. Graham,et al.  Semiparametrically Efficient Estimation of the Average Linear Regression Function , 2018, Journal of Econometrics.

[8]  Zhengyuan Zhou,et al.  Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..

[9]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference: Application to Causal Effects and Other Semiparametric Estimands , 2018, Econometrica.

[10]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[11]  W. Newey,et al.  Double/De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers , 2018, 1802.08667.

[12]  J. Robins,et al.  Double/de-biased machine learning using regularized Riesz representers , 2018 .

[13]  Xinkun Nie,et al.  Quasi-oracle estimation of heterogeneous treatment effects , 2017, Biometrika.

[14]  Xinkun Nie,et al.  Learning Objectives for Treatment Effect Estimation , 2017 .

[15]  Timothy B. Armstrong,et al.  Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness , 2017, Econometrica.

[16]  Stefan Wager,et al.  Augmented minimax linear estimation , 2017, The Annals of Statistics.

[17]  David A. Hirshberg,et al.  Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities , 2017 .

[18]  Susan Athey,et al.  Estimation Considerations in Contextual Bandits , 2017, ArXiv.

[19]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[20]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[21]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[22]  Antoine Chambaz,et al.  Faster Rates for Policy Learning , 2017, 1704.06431.

[23]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[24]  T. Kitagawa,et al.  Equality-Minded Treatment Choice , 2017, Journal of Business & Economic Statistics.

[25]  J. Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[26]  S. Lee,et al.  Best subset binary prediction , 2016, Journal of Econometrics.

[27]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[28]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[29]  Max Tabord-Meehan,et al.  Model Selection for Treatment Choice: Penalized Welfare Maximization , 2016, 1609.03167.

[30]  Nathan Kallus,et al.  Recursive Partitioning for Personalization using Observational Data , 2016, ICML.

[31]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[32]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[33]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[34]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[35]  M. J. van der Laan,et al.  STATISTICAL INFERENCE FOR THE MEAN OUTCOME UNDER A POSSIBLY NON-UNIQUE OPTIMAL TREATMENT STRATEGY. , 2016, Annals of statistics.

[36]  Maximilian Kasy,et al.  Partial Identification, Distributional Preferences, and the Welfare Ranking of Policies , 2016, Review of Economics and Statistics.

[37]  Keisuke Hirano,et al.  Panel Asymptotics and Statistical Decision Theory , 2016 .

[38]  Timothy B. Armstrong,et al.  Optimal Inference in a Class of Regression Models , 2015, 1511.06028.

[39]  Michael R Kosorok,et al.  Residual Weighted Learning for Estimating Individualized Treatment Rules , 2015, Journal of the American Statistical Association.

[40]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[41]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2015 .

[42]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[43]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[44]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[45]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[46]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[47]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[48]  Timothy B. Armstrong,et al.  Inference on optimal treatment assignments , 2013, The Japanese Economic Review.

[49]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning , 2013, Oper. Res..

[50]  Peter M. Aronow,et al.  Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable , 2013, Political Analysis.

[51]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[52]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[53]  Charles F. Manski,et al.  Identification of Treatment Response with Social Interactions , 2013 .

[54]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[55]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[56]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[57]  Herman K. van Dijk,et al.  The Oxford Handbook of Bayesian Econometrics , 2011 .

[58]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[59]  Gary Chamberlain,et al.  BAYESIAN ASPECTS OF TREATMENT CHOICE , 2011 .

[60]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[61]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[62]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[63]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[64]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[65]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[66]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[67]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..

[68]  K. Hirano,et al.  Asymptotics for Statistical Treatment Rules , 2009 .

[69]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[70]  Jörg Stoye,et al.  Minimax regret treatment choice with finite samples , 2009 .

[71]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[72]  Debopam Bhattacharya,et al.  Inferring Welfare Maximizing Treatment Assignment Under Budget Constraints , 2008 .

[73]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[74]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[75]  Charles F. Manski,et al.  Identification for Prediction and Decision , 2008 .

[76]  Chunrong Ai,et al.  Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables , 2007 .

[77]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[78]  Xiaohong Chen,et al.  Semiparametric efficiency in GMM models with auxiliary data , 2007, 0705.0069.

[79]  R. Nickl,et al.  Bracketing Metric Entropy Rates and Empirical Central Limit Theorems for Function Classes of Besov- and Sobolev-Type , 2007 .

[80]  Michael E. Sobel,et al.  What Do Randomized Studies of Housing Mobility Demonstrate? , 2006 .

[81]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[82]  V. Koltchinskii,et al.  Concentration inequalities and asymptotic results for ratio type empirical processes , 2006, math/0606788.

[83]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[84]  C. Manski Statistical treatment rules for heterogeneous populations , 2003 .

[85]  J. Hallas Observational Studies , 2003 .

[86]  Alberto Abadie Semiparametric instrumental variable estimation of treatment response models , 2003 .

[87]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[88]  P. Massart,et al.  Gaussian model selection , 2001 .

[89]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[90]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[91]  C. Manski,et al.  Monotone Instrumental Variables with an Application to the Returns to Schooling , 1998 .

[92]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[93]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[94]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[95]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[96]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[97]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[98]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[99]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[100]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[101]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1994 .

[102]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[103]  Thomas M. Stoker,et al.  Semiparametric Estimation of Index Coefficients , 1989 .

[104]  D. Pollard,et al.  Simulation and the Asymptotics of Optimization Estimators , 1989 .

[105]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[106]  A. Schick On Asymptotically Efficient Estimation in Semiparametric Models , 1986 .

[107]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[108]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[109]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[110]  J. K. Lindsey,et al.  Comparison of Probability Distributions , 1974 .

[111]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[112]  Leonard J. Savage,et al.  The Theory of Statistical Decision , 1951 .

[113]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[114]  Stefan Wager On Regression Tables for Policy Learning: Comment on a Paper by Jiang, Song, Li and Zeng , 2020, Statistica Sinica.

[115]  James M. Robins,et al.  MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED , 2016 .

[116]  Min Zhang,et al.  Estimating optimal treatment regimes from a classification perspective , 2016 .

[117]  T. Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[118]  Min Zhang,et al.  Estimating optimal treatment regimes from a classification perspective , 2012, Stat.

[119]  Jörg Stoye Minimax regret treatment choice with covariates or with limited validity of experiments , 2012 .

[120]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[121]  A. Tetenov Statistical treatment choice based on asymmetric minimax regret criteria , 2009 .

[122]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[123]  Aad van der Vaart,et al.  The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[124]  P. Bartlett,et al.  Empirical minimization , 2006 .

[125]  G. Imbens,et al.  Evaluating the Differential Effects of Alternative Welfare-to-Work Training Components : A Re-Analysis of the California GAIN Program * by , 2005 .

[126]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[127]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[128]  Rajeev H. Dehejiaa,et al.  Program evaluation as a decision problem , 2002 .

[129]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1 , 2002 .

[130]  L. Breiman Random Forests , 2001, Machine Learning.

[131]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[132]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[133]  O. Lepskii On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[134]  L. Breiman Classification and regression trees , 1983 .

[135]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .