论文信息 - Quantile-based clustering - 字舞流文

Quantile-based clustering

A new cluster analysis method, $K$-quantiles clustering, is introduced. $K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for $K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although $K$-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of $K$-quantiles clustering is proved, and it is shown that $K$-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, $K$-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by $K$-quantiles.

Laura Anderlucci | Cinzia Viroli | Christian Hennig | C. Viroli | C. Hennig | L. Anderlucci

[1] Michalis Vazirgiannis,et al. Method-Independent Indices for Cluster Validation and Estimating the Number of Clusters , 2015 .

[2] D. Pollard. Strong Consistency of $K$-Means Clustering , 1981 .

[3] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4] C. Iyigun. Probabilistic Distance Clustering , 2011 .

[5] R. Gnanadesikan,et al. Weighting and selection of variables for cluster analysis , 1995 .

[6] H. Joe. Generating random correlation matrices based on partial correlations , 2006 .

[7] Inderjit S. Dhillon,et al. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[8] L. Hubert,et al. Comparing partitions , 1985 .

[9] Christian Hennig,et al. Clustering strategy and method selection , 2015, 1503.02059.

[10] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12] Geoffrey J. McLachlan,et al. Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[13] Krzysztof Podgórski,et al. A Multivariate and Asymmetric Generalization of Laplace Distribution , 2000, Comput. Stat..

[14] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15] J. Friedman,et al. Clustering objects on subsets of attributes (with discussion) , 2004 .

[16] M. Fligner,et al. Distance Based Ranking Models , 1986 .

[17] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18] Friedrich Leisch,et al. Resampling Methods for Exploring Cluster Stability , 2015 .

[19] Thomas Brendan Murphy,et al. Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[20] J. H. Ward. Hierarchical Grouping to Optimize an Objective Function , 1963 .

[21] C. Viroli,et al. Quantile-based classifiers. , 2016, Biometrika.

[22] C. L. Mallows. NON-NULL RANKING MODELS. I , 1957 .