Subgroup identification for precision medicine: A comparative review of 13 methods

Natural heterogeneity in patient populations can make it very hard to develop treatments that benefit all patients. As a result, an important goal of precision medicine is identification of patient subgroups that respond to treatment at a much higher (or lower) rate than the population average. Despite there being many subgroup identification methods, there is no comprehensive comparative study of their statistical properties. We review 13 methods and use real‐world and simulated data to compare the performance of their publicly available software using seven criteria: (a) bias in selection of subgroup variables, (b) probability of false discovery, (c) probability of identifying correct predictive variables, (d) bias in estimates of subgroup treatment effects, (e) expected subgroup size, (f) expected true treatment effect of subgroups, and (g) subgroup stability. The results show that many methods fare poorly on at least one criterion.

[1]  W. Loh,et al.  A regression tree approach to identifying subgroups with differential treatment effects , 2014, Statistics in medicine.

[2]  Xiaogang Su,et al.  Interaction Trees with Censored Survival Data , 2008, The international journal of biostatistics.

[3]  Menggang Yu,et al.  Regularized outcome weighted subgroup identification for differential treatment effects , 2015, Biometrics.

[4]  Marianthi Markatou,et al.  A comparative study of subgroup identification methods for differential treatment effect: Performance metrics and recommendations , 2018, Statistical methods in medical research.

[5]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .

[6]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[7]  Menggang Yu,et al.  Subgroup Identification Using the personalized Package , 2018, J. Stat. Softw..

[8]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[9]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[10]  A. Italiano,et al.  Prognostic or predictive? It's time to get back to definitions! , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[12]  Tianxi Cai,et al.  A general statistical framework for subgroup identification and comparative treatment scoring , 2017, Biometrics.

[13]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[14]  W. Loh,et al.  Improving the precision of classification trees , 2010, 1011.0608.

[15]  V. Devanarayan,et al.  A PRIM approach to predictive-signature development for patient stratification , 2014, Statistics in medicine.

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[18]  Antonio Ciampi,et al.  Tree-structured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria , 2005, Stat. Comput..

[19]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[20]  T. Hothorn,et al.  Individual treatment effect prediction for amyotrophic lateral sclerosis patients , 2018, Statistical methods in medical research.

[21]  W. Loh,et al.  Subgroups from regression trees with adjustment for prognostic effects and postselection inference , 2019, Statistics in medicine.

[22]  I. Lipkovich,et al.  Tutorial in biostatistics: data‐driven subgroup identification and analysis in clinical trials , 2017, Statistics in medicine.

[23]  Haoda Fu,et al.  Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables , 2016, Statistics in medicine.

[24]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[25]  Ilya Lipkovich,et al.  Strategies for Identifying Predictive Biomarkers and Subgroups with Enhanced Treatment Effect in Clinical Trials Using SIDES , 2014, Journal of biopharmaceutical statistics.

[26]  Iven Van Mechelen,et al.  A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment–subgroup interactions , 2013, Advances in Data Analysis and Classification.

[27]  Salim Yusuf,et al.  Effect of enalapril on survival in patients with reduced left ventricular ejection fractions and congestive heart failure. , 1991, The New England journal of medicine.

[28]  V. Devanarayan,et al.  Patient subgroup identification for clinical drug development , 2017, Statistics in medicine.

[29]  W. Sauerbrei,et al.  Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Achim Zeileis,et al.  Model-Based Recursive Partitioning for Subgroup Analyses , 2016, The international journal of biostatistics.

[33]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..