Regularized k-means clustering of high-dimensional data and its asymptotic consistency

This is a copy of an article published in the Electronic Journal of Statistics © 2012 Institute of Mathematical Statistics at DOI: 10.1214/12-EJS668.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  D. Pollard A Central Limit Theorem for $k$-Means Clustering , 1982 .

[5]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[9]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[10]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[11]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[12]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[13]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[14]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[15]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[16]  B. Chandra,et al.  A new approach: Interrelated two-way clustering of gene expression data , 2006 .

[17]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[18]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[19]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[20]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[21]  Ji Zhu,et al.  Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data , 2008, Biometrics.

[22]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[23]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[24]  Xiaotong Shen,et al.  Variable Selection in Penalized Model‐Based Clustering Via Regularization on Grouped Parameters , 2008, Biometrics.

[25]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[26]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[27]  E. Levina,et al.  Pairwise Variable Selection for High‐Dimensional Model‐Based Clustering , 2010, Biometrics.

[28]  Junhui Wang Consistent selection of the number of clusters via crossvalidation , 2010 .

[29]  Junhui Wang,et al.  Penalized cluster analysis with applications to family data , 2011, Comput. Stat. Data Anal..