Quantile-based clustering

A new cluster analysis method, $K$-quantiles clustering, is introduced. $K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd's algorithm for $K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although $K$-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of $K$-quantiles clustering is proved, and it is shown that $K$-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, $K$-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by $K$-quantiles.

[1]  Michalis Vazirgiannis,et al.  Method-Independent Indices for Cluster Validation and Estimating the Number of Clusters , 2015 .

[2]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  C. Iyigun Probabilistic Distance Clustering , 2011 .

[5]  R. Gnanadesikan,et al.  Weighting and selection of variables for cluster analysis , 1995 .

[6]  H. Joe Generating random correlation matrices based on partial correlations , 2006 .

[7]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[8]  L. Hubert,et al.  Comparing partitions , 1985 .

[9]  Christian Hennig,et al.  Clustering strategy and method selection , 2015, 1503.02059.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[13]  Krzysztof Podgórski,et al.  A Multivariate and Asymmetric Generalization of Laplace Distribution , 2000, Comput. Stat..

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[16]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18]  Friedrich Leisch,et al.  Resampling Methods for Exploring Cluster Stability , 2015 .

[19]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[20]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[21]  C. Viroli,et al.  Quantile-based classifiers. , 2016, Biometrika.

[22]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .