Semiparametric Clustering: A Robust Alternative to Parametric Clustering

Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

[1]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[2]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  Howard Wainer,et al.  Robust Regression & Outlier Detection , 1988 .

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[7]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[9]  Xinhua Zhuang,et al.  Gaussian mixture density modeling, decomposition, and applications , 1996, IEEE Trans. Image Process..

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[15]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[17]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[18]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[19]  I. Jolliffe Principal Component Analysis , 2002 .

[20]  C. A. Murthy,et al.  Density-Based Multiscale Data Condensation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[23]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[24]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[25]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[26]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[27]  Jia Li Clustering Based on a Multilayer Mixture Model , 2005 .

[28]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[29]  Hongyuan Zha,et al.  Computational Statistics Data Analysis , 2021 .

[30]  Javier M. Moguerza,et al.  Estimation of high-density regions using one-class neighbor machines , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[34]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[35]  B. Schölkopf,et al.  PG-means: learning the number of clusters in data , 2007 .

[36]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[37]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[38]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[39]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[40]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[41]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[42]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[43]  Sullivan Hidot,et al.  An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering , 2010, Pattern Recognit. Lett..

[44]  Luis Angel García-Escudero,et al.  A review of robust clustering methods , 2010, Adv. Data Anal. Classif..

[45]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[46]  Yung-Yu Chuang,et al.  Affinity aggregation for spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Argyris Kalogeratos,et al.  Dip-means: an incremental clustering method for estimating the number of clusters , 2012, NIPS.

[48]  Miin-Shen Yang,et al.  A robust EM clustering algorithm for Gaussian mixture models , 2012, Pattern Recognit..

[49]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[51]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[52]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[53]  Tin Kam Ho,et al.  Learner excellence biased by data set selection: A case for data characterisation and artificial data sets , 2013, Pattern Recognit..

[54]  Lei Shi,et al.  Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[55]  Vassilios Morellas,et al.  Bayesian Nonparametric Clustering for Positive Definite Matrices , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[57]  Binbin Pan,et al.  A Novel Framework for Learning Geometry-Aware Kernels , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Alexandros Georgogiannis,et al.  Robust k-means: a Theoretical Revisit , 2016, NIPS.

[59]  Jian-Huang Lai,et al.  Out-of-Sample Extensions for Non-Parametric Kernel Methods , 2017, IEEE Transactions on Neural Networks and Learning Systems.