Multi-threshold Change Plane Model: Estimation Theory and Applications in Subgroup Identification

We propose a multithreshold change plane regression model which naturally partitions the observed subjects into subgroups with different covariate effects. The underlying grouping variable is a linear function of observed covariates and thus multiple thresholds produce change planes in the covariate space. We contribute a novel two-stage estimation approach to determine the number of subgroups, the location of thresholds, and all other regression parameters. In the first stage we adopt a group selection principle to consistently identify the number of subgroups, while in the second stage change point locations and model parameter estimates are refined by a penalized induced smoothing technique. Our procedure allows sparse solutions for relatively moderate- or high-dimensional covariates. We further establish the asymptotic properties of our proposed estimators under appropriate technical conditions. We evaluate the performance of the proposed methods by simulation studies and provide illustrations using two medical data examples. Our proposal for subgroup identification may lead to an immediate application in personalized medicine.

[1]  Michael R Kosorok,et al.  Residual Weighted Learning for Estimating Individualized Treatment Rules , 2015, Journal of the American Statistical Association.

[2]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[3]  L. Tian,et al.  Analysis of randomized comparative clinical trial data for personalized treatment selections. , 2011, Biostatistics.

[4]  Paul Fearnhead,et al.  A computationally efficient nonparametric approach for changepoint detection , 2016, Statistics and Computing.

[5]  Andros Kourtellos,et al.  STRUCTURAL THRESHOLD REGRESSION , 2014, Econometric Theory.

[6]  M. Kosorok,et al.  The Change-Plane Cox Model. , 2018, Biometrika.

[7]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[8]  Rui Song,et al.  Entropy Learning for Dynamic Treatment Regimes. , 2019, Statistica Sinica.

[9]  Xuming He,et al.  Inference for Subgroup Analysis With a Structured Logistic-Normal Mixture Model , 2015 .

[10]  Weng Kee Wong,et al.  A model‐based multithreshold method for subgroup identification , 2019, Statistics in medicine.

[11]  Myung Hwan Seo,et al.  The lasso for high dimensional regression with a possible change point , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[12]  Donglin Zeng,et al.  On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning , 2015, Stat.

[13]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[14]  R. C. Messenger,et al.  A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis , 1972 .

[15]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[17]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[18]  B. Hansen Sample Splitting and Threshold Estimation , 2000 .

[19]  Ying-Qi Zhao,et al.  Improved Doubly Robust Estimation in Learning Optimal Individualized Treatment Rules , 2020, Journal of the American Statistical Association.

[20]  Lu Tian,et al.  Effectively Selecting a Target Population for a Future Comparative Study , 2013, Journal of the American Statistical Association.

[21]  J. Bai,et al.  Estimation of a Change Point in Multiple Regression Models , 1997, Review of Economics and Statistics.

[22]  H. Tong Non-linear time series. A dynamical system approach , 1990 .

[23]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[24]  M. Kosorok,et al.  Q-LEARNING WITH CENSORED DATA. , 2012, Annals of statistics.

[25]  Wenbin Lu,et al.  Concordance‐assisted learning for estimating optimal individualized treatment regimes , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  Paul Fearnhead,et al.  On optimal multiple changepoint algorithms for large data , 2014, Statistics and Computing.

[27]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[28]  M. Mayes,et al.  A multicenter, randomized, double-blind, placebo-controlled trial of oral type I collagen treatment in patients with diffuse cutaneous systemic sclerosis: I. oral type I collagen does not improve skin in all patients, but may improve skin in late-phase disease. , 2008, Arthritis and rheumatism.

[29]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[30]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[31]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[32]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[33]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[34]  I. Lipkovich,et al.  Tutorial in biostatistics: data‐driven subgroup identification and analysis in clinical trials , 2017, Statistics in medicine.

[35]  Dong Li,et al.  On the least squares estimation of multiple-regime threshold autoregressive models , 2012 .

[36]  O. Linton,et al.  A Smoothed Least Squares Estimator for Threshold Regression Models , 2005 .

[37]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[38]  Wenbin Lu,et al.  Variable selection for optimal treatment decision , 2013, Statistical methods in medical research.

[39]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[40]  Wenbin Lu,et al.  Change-Plane Analysis for Subgroup Detection and Sample Size Calculation , 2017, Journal of the American Statistical Association.

[41]  Tianxi Cai,et al.  A general statistical framework for subgroup identification and comparative treatment scoring , 2017, Biometrics.

[42]  W. Wong,et al.  A semi-parametric analysis for identifying scleroderma patients responsive to an anti-fibrotic agent. , 2009, Contemporary clinical trials.

[43]  Joel L. Horowitz,et al.  Bootstrap Critical Values for Tests Based on the Smoothed Maximum Score Estimator , 2002 .

[44]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[45]  R. Strawderman,et al.  Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. , 2009, Biometrika.

[46]  G. Golub,et al.  Separable nonlinear least squares: the variable projection method and its applications , 2003 .

[47]  Jean-Yves Pitarakis,et al.  Estimation and Model Selection Based Inference in Single and Multiple Threshold Models , 2002 .

[48]  Donglin Zeng,et al.  New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.

[49]  J. Horowitz A Smoothed Maximum Score Estimator for the Binary Response Model , 1992 .

[50]  Yuehua Wu,et al.  A novel and fast methodology for simultaneous multiple structural break estimation and variable selection for nonstationary time series models , 2013, Stat. Comput..

[51]  Menggang Yu,et al.  Regularized outcome weighted subgroup identification for differential treatment effects , 2015, Biometrics.

[52]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[53]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[54]  Wenbin Lu,et al.  On estimation of optimal treatment regimes for maximizing t‐year survival probability , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[55]  B. Jin,et al.  Multi-threshold accelerated failure time model , 2015, The Annals of Statistics.

[56]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .