Recursive Partitioning for Personalization using Observational Data

We study the problem of learning to choose from m discrete treatment options (e.g., news item or medical drug) the one with best causal effect for a particular instance (e.g., user or patient) where the training data consists of passive observations of covariates, treatment, and the outcome of the treatment. The standard approach to this problem is regress and compare: split the training data by treatment, fit a regression model in each split, and, for a new instance, predict all m outcomes and pick the best. By reformulating the problem as a single learning task rather than m separate ones, we propose a new approach based on recursively partitioning the data into regimes where different treatments are optimal. We extend this approach to an optimal partitioning approach that finds a globally optimal partition, achieving a compact, interpretable, and impactful personalization model. We develop new tools for validating and evaluating personalization models on observational data and use these to demonstrate the power of our novel approaches in a personalized medicine and a job training application.

[1]  Peter Eades,et al.  On Optimal Trees , 1981, J. Algorithms.

[2]  S. Groshen,et al.  A multivariate analysis of genomic polymorphisms: prediction of clinical outcome to 5-FU/oxaliplatin combination chemotherapy in refractory colorectal cancer , 2004, British Journal of Cancer.

[3]  Nathan Kallus,et al.  A Framework for Optimal Matching for Causal Inference , 2016, AISTATS.

[4]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[5]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[6]  A. Jaffer,et al.  Practical tips for warfarin dosing and monitoring. , 2003, Cleveland Clinic journal of medicine.

[7]  TreesKristin P. Bennett,et al.  Optimal Decision Trees , 1996 .

[8]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[9]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[10]  T. Merkle,et al.  Strong Laws of Large Numbers and Nonparametric Estimation , 2010 .

[11]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[12]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[13]  Avi Goldfarb,et al.  Online Display Advertising: Targeting and Obtrusiveness , 2011, Mark. Sci..

[14]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[15]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[16]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[17]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[18]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[19]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[20]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[21]  E. Lange,et al.  Polymorphisms in the VKORC1 gene are strongly associated with warfarin dosage requirements in patients receiving anticoagulation , 2006, Journal of Medical Genetics.

[22]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[23]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[24]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[25]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[26]  M L Feldstein,et al.  A statistical model for predicting response of breast cancer patients to cytotoxic chemotherapy. , 1978, Cancer research.

[27]  Ying Daisy Zhuo,et al.  Personalized Diabetes Management Using Electronic Medical Records , 2016, Diabetes Care.

[28]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[29]  Andrew Gelman,et al.  Applied Bayesian Modeling And Causal Inference From Incomplete-Data Perspectives , 2005 .

[30]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[31]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[32]  Rajeev Dehejia,et al.  Propensity Score-Matching Methods for Nonexperimental Causal Studies , 2002, Review of Economics and Statistics.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[35]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[36]  A. Zeevi,et al.  A Linear Response Bandit Problem , 2013 .

[37]  Sercan Yildiz,et al.  Incremental and encoding formulations for Mixed Integer Programming , 2013, Oper. Res. Lett..

[38]  L. Lesko,et al.  Personalized Medicine: Elusive Dream or Imminent Reality? , 2007, Clinical pharmacology and therapeutics.

[39]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[40]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[41]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[42]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[43]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[44]  Janice D Nunnelee,et al.  Review of an Article: The international Warfarin Pharmacogenetics Consortium (2009). Estimation of the warfarin dose with clinical and pharmacogenetic data. NEJM 360 (8): 753-64. , 2009, Journal of vascular nursing : official publication of the Society for Peripheral Vascular Nursing.

[45]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[46]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[47]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[50]  Robert P. Lieli,et al.  Estimating Conditional Average Treatment Effects , 2014 .

[51]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[52]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.