Parallel subgroup analysis of high-dimensional data via M-regression

It becomes an interesting problem to identify subgroup structures in data analysis as populations are probably heterogeneous in practice. In this paper, we consider M-estimators together with both concave and pairwise fusion penalties, which can deal with high-dimensional data containing some outliers. The penalties are applied both on covariates and treatment effects, where the estimation is expected to achieve both variable selection and data clustering simultaneously. An algorithm is proposed to process relatively large datasets based on parallel computing. We establish the convergence analysis of the proposed algorithm, the oracle property of the penalized M-estimators, and the selection consistency of the proposed criterion. Our numerical study demonstrates that the proposed method is promising to efficiently identify subgroups hidden in high-dimensional data.

[1]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[2]  Q. Shao,et al.  On Parameters of Increasing Dimensions , 2000 .

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  H. Wang,et al.  ROBUST SUBGROUP IDENTIFICATION , 2019, Statistica Sinica.

[5]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[6]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[7]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[8]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[9]  R. Glowinski,et al.  Numerical Methods for Nonlinear Variational Problems , 1985 .

[10]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[11]  Jian Huang,et al.  A Concave Pairwise Fusion Approach to Subgroup Analysis , 2015, 1508.07045.

[12]  Roland Glowinski,et al.  On the Solution of a Class of Non-Linear Dirichlet Problems by a Penalty-Duality Method and Finite Elements of Order One , 1974, Optimization Techniques.

[13]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[14]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[15]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[16]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[17]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[18]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[19]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[20]  Xuming He,et al.  Inference for Subgroup Analysis With a Structured Logistic-Normal Mixture Model , 2015 .

[21]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[22]  Wei Pan,et al.  Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty , 2013, J. Mach. Learn. Res..

[23]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[24]  Wotao Yin,et al.  Faster Convergence Rates of Relaxed Peaceman-Rachford and ADMM Under Regularity Assumptions , 2014, Math. Oper. Res..

[25]  Guoyin Li,et al.  Global Convergence of Splitting Methods for Nonconvex Composite Optimization , 2014, SIAM J. Optim..

[26]  Wei Pan,et al.  A New Algorithm and Theory for Penalized Regression-based Clustering , 2016, J. Mach. Learn. Res..

[27]  P. McNicholas Model-based classification using latent Gaussian mixture models , 2010 .

[28]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).