Multiply robust subgroup identification for longitudinal data with dropouts via median regression

Abstract Subgroup identification serves as an important step towards precision medicine which has attracted great attention recently. On the other hand, longitudinal data with dropouts often arises in medical research. However there is little work in subgroup identification considering this data type. Therefore, in this paper we propose a new subgroup identification method based on concave fusion penalization and median regression for longitudinal data with dropouts. In order to deal with missingness, we introduce multiply robust weights which allow multiple models for the probability of being observed. As long as one of the models is correctly specified, the proposed estimator is able to achieve oracle property in the case of missingness. Furthermore, we develop an efficient algorithm and propose a modified Bayesian information criterion to select penalization parameter. The asymptotic properties of the proposed method is established under some regularity conditions. The numerical performance is illustrated in simulations and the proposed method is applied to the quality of life data from a breast cancer trail.

[1]  Chenlei Leng,et al.  Empirical likelihood and quantile regression in longitudinal data analysis , 2011 .

[2]  Xuming He,et al.  Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models , 1997 .

[3]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[4]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[5]  S. Lipsitz,et al.  Quantile Regression Methods for Longitudinal Data with Drop‐outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus , 1997 .

[6]  Peisong Han,et al.  A further study of the multiply robust estimator in missing data analysis , 2014 .

[7]  Wenqing He,et al.  Median Regression Models for Longitudinal Data with Dropouts , 2009, Biometrics.

[8]  Xuming He,et al.  Inference for Subgroup Analysis With a Structured Logistic-Normal Mixture Model , 2015 .

[9]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[10]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[11]  Zhongyi Zhu,et al.  Quantile regression in longitudinal studies with dropouts and measurement errors , 2016 .

[12]  Jian Huang,et al.  A Concave Pairwise Fusion Approach to Subgroup Analysis , 2015, 1508.07045.

[13]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[14]  D. Tu,et al.  Randomized trial of intensive cyclophosphamide, epirubicin, and fluorouracil chemotherapy compared with cyclophosphamide, methotrexate, and fluorouracil in premenopausal women with node-positive breast cancer. National Cancer Institute of Canada Clinical Trials Group. , 1998, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[16]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[17]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[18]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[19]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[20]  Peisong Han,et al.  Multiply Robust Estimation in Regression Analysis With Missing Data , 2014 .

[21]  H. Wang,et al.  ROBUST SUBGROUP IDENTIFICATION , 2019, Statistica Sinica.

[22]  Wei Pan,et al.  Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty , 2013, J. Mach. Learn. Res..

[23]  Jianqing Fan,et al.  Homogeneity Pursuit , 2015, Journal of the American Statistical Association.

[24]  Zhongyi Zhu,et al.  Quantile regression and empirical likelihood for the analysis of longitudinal data with monotone missing responses due to dropout, with applications to quality of life measurements from clinical trials , 2019, Statistics in medicine.

[25]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[26]  Linglong Kong,et al.  A general framework for quantile estimation with incomplete data , 2019, Journal of the Royal Statistical Society. Series B, Statistical methodology.