Optimal personalized treatment rules for marketing interventions: A review of methods, a new proposal, and an insurance case study

In many important settings, subjects can show significant heterogeneity in response to a stimulus or treatment. For instance, a treatment that works for the overall population might be highly ine ective, or even harmful, for a subgroup of subjects with specific characteristics. Similarly, a new treatment may not be better than an existing treatment in the overall population, but there is likely a subgroup of subjects who would bene t from it. The notion that one size may not fit all is becoming increasingly recognized in a wide variety of fields, ranging from economics to medicine. This has drawn signi cant attention to personalize the choice of treatment, so it is optimal for each individual. An optimal personalized treatment is the one that maximizes the probability of a desirable outcome. We call the task of learning the optimal personalized treatment "personalized treatment learning". From the statistical learning perspective, this problem imposes some challenges, primarily because the optimal treatment is unknown on a given training set. A number of statistical methods have been proposed recently to tackle this problem. However, to the best of our knowledge, there has been no attempt so far to provide a comprehensive view of these methods and to benchmark their performance. The purpose of this paper is twofold: i) to describe seven recently proposed methods for personalized treatment learning and compare their performance on an extensive numerical study, and ii) to propose a novel method labeled causal conditional inference trees and its natural extension to causal conditional inference forests. The results show that our new proposed method often outperforms the alternatives on the numerical settings described in this article. We also illustrate an application of the proposed method using data from a large Canadian insurer for the purpose of selecting the best targets for cross-selling an insurance product.

[1]  S. P. Wright,et al.  Adjusted P-values for simultaneous inference , 1992 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[5]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[6]  B. Minasny The Elements of Statistical Learning, Second Edition, Trevor Hastie, Robert Tishirani, Jerome Friedman. (2009), Springer Series in Statistics, ISBN 0172-7397, 745 pp , 2009 .

[7]  O. Zeynep Akşin,et al.  Modeling Customer Reactions to Sales Attempts: If Cross-Selling Backfires , 2010 .

[8]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[11]  Farrokh Alemi,et al.  Improved Statistical Methods are Needed to Advance Personalized Medicine. , 2009, The open translational medicine journal.

[12]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..

[13]  Szymon Jaroszewicz,et al.  Decision trees for uplift modeling with single and multiple treatments , 2011, Knowledge and Information Systems.

[14]  Marí del Cristo,et al.  Dollarization and the relationship between EMBI and fundamentals in Latin American Countries , 2014 .

[15]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[16]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[17]  LEO GUELMAN,et al.  Uplift Random Forests , 2015, Cybern. Syst..

[18]  Stephen Shaoyi Liao,et al.  A prediction framework based on contextual data to support Mobile Personalized Marketing , 2013, Decis. Support Syst..

[19]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[20]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[21]  Xin Yan,et al.  Facilitating score and causal inference trees for large observational studies , 2012, J. Mach. Learn. Res..

[22]  D. Zeng,et al.  Recent development on statistical methods for personalized medicine discovery , 2013, Frontiers of Medicine.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Grace Wahba,et al.  Soft and hard classification by reproducing kernel Hilbert space methods , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S Y LoVictor The true lift model , 2002 .

[26]  Mykola Pechenizkiy,et al.  Learning with Actionable Attributes: Attention -- Boundary Cases! , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[27]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[28]  Helmut Strasser,et al.  On the Asymptotic Theory of Permutation Statistics , 1999 .

[29]  D. Rubin ASSIGNMENT TO TREATMENT GROUP ON THE BASIS OF A COVARIATE , 1976 .

[30]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[31]  P. Holland Statistics and Causal Inference , 1985 .

[32]  D. Rubin,et al.  Causal Inference in Retrospective Studies , 1987 .

[33]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[34]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[35]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[36]  Leo Guelman,et al.  Random Forests for Uplift Modeling: An Insurance Customer Retention Case , 2012, MS.

[37]  Jens Perch Nielsen,et al.  Selecting prospects for cross-selling financial products using multivariate credibility , 2012, Expert Syst. Appl..

[38]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[39]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[40]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[41]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[42]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[43]  Stephen Shaoyi Liao,et al.  Combining empirical experimentation and modeling techniques: A design research approach for personalized mobile advertising applications , 2008, Decis. Support Syst..

[44]  Marta Gómez-Puig,et al.  Causality and Contagion in EMU Sovereign Debt Markets , 2014 .

[45]  Montserrat Guillén,et al.  Non-parametric Models for Univariate Claim Severity Distributions - an approach using R , 2014 .

[46]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[47]  S. Jaroszewicz,et al.  Uplift modeling for clinical trial data , 2012 .

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[50]  Leo Guelman,et al.  A causal inference approach to measure price elasticity in Automobile Insurance , 2014, Expert Syst. Appl..

[51]  Huimin Zhao,et al.  Incorporating domain knowledge into data mining classifiers: An application in indirect lending , 2008, Decis. Support Syst..

[52]  Kut C. So,et al.  Note - A Mathematical Model for Evaluating Cross-Sales Policies in Telephone Service Centers , 2007, Manuf. Serv. Oper. Manag..

[53]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[54]  Jens Perch Nielsen,et al.  Optimal customer selection for cross-selling of financial services products , 2013, Expert Syst. Appl..

[55]  Yajiong Xue,et al.  Web-based intervention support system for health promotion , 2006, Decis. Support Syst..

[56]  Fredrik Thuring,et al.  Multidimensional Credibility with Time Effects: An Application to Commercial Business Lines , 2009 .

[57]  Donald B. Rubin,et al.  Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology , 2006 .

[58]  Marta Gómez-Puig,et al.  An Update on EMU Sovereign Yield Spread Drivers in Times of Crisis: A Panel Data Analysis , 2014 .

[59]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[60]  J. Friedman Stochastic gradient boosting , 2002 .

[61]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[62]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[63]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[64]  Patrick D. Surry,et al.  Real-World Uplift Modelling with Significance-Based Uplift Trees , 2012 .

[65]  Montserrat Guillén,et al.  Accounting for severity of risk when pricing insurance products , 2014 .