Cluster validity measure and merging system for hierarchical clustering considering outliers

Clustering algorithms have evolved to handle more and more complex structures. However, the measures that allow to qualify the quality of such clustering partitions are rare and have been developed only for specific algorithms. In this work, we propose a new cluster validity measure (CVM) to quantify the clustering performance of hierarchical algorithms that handle overlapping clusters of any shape and in the presence of outliers. This work also introduces a cluster merging system (CMS) to group clusters that share outliers. When located in regions of cluster overlap, these outliers may be issued by a mixture of nearby cores. The proposed CVM and CMS are applied to hierarchical extensions of the Support Vector and Gaussian Process Clustering algorithms both in synthetic and real experiments. These results show that the proposed metrics help to select the appropriate level of hierarchy and the appropriate hyperparameters. HighlightsCluster validity measure for arbitrary shaped clusters with outliers.Cluster merging system grouping cluster cores based on the outliers� structure.Truly hierarchical variants of support vector and Gaussian process clustering.Benefits for unsupervised change detection applications are presented.

[1]  Edward R. Dougherty,et al.  Model-based evaluation of clustering validation measures , 2007, Pattern Recognit..

[2]  Haiqiao Huang,et al.  A robust adaptive clustering analysis method for automatic identification of clusters , 2012, Pattern Recognit..

[3]  Olga Sykioti,et al.  Monitoring canopy biophysical and biochemical parameters in ecosystem scale using satellite hyperspectral imagery: An application on a Phlomis fruticosa Mediterranean ecosystem using multiangular CHRIS/PROBA observations , 2010 .

[4]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[5]  M. Borgeaud,et al.  Unsupervised change detection via hierarchical support vector clustering , 2012, 7th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS).

[6]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[7]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[8]  HalkidiMaria,et al.  Cluster validity methods , 2002 .

[9]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12]  Yossef Steinberg,et al.  A comparison of cluster validity criteria for a mixture of normal distributed data , 2000, Pattern Recognit. Lett..

[13]  Qian Du,et al.  Multi-Modal Change Detection, Application to the Detection of Flooded Areas: Outcome of the 2009–2010 Data Fusion Contest , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[15]  Antonio J. Plaza,et al.  A new approach to mixed pixel classification of hyperspectral imagery based on extended morphological profiles , 2004, Pattern Recognit..

[16]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[17]  Antonio J. Plaza,et al.  A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[19]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[20]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[21]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[22]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[23]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[24]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[25]  Jaewook Lee,et al.  Clustering Based on Gaussian Processes , 2007, Neural Computation.

[26]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[27]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[28]  Jean-Philippe Thiran,et al.  Semi-Supervised Novelty Detection Using SVM Entire Solution Path , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[30]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[31]  Gustavo Camps-Valls,et al.  Unsupervised Change Detection With Kernels , 2012, IEEE Geoscience and Remote Sensing Letters.

[32]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[33]  Jeen-Shing Wang,et al.  A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm , 2008, Pattern Recognit..

[34]  Joydeep Ghosh,et al.  Bregman bubble clustering , 2008, ACM Trans. Knowl. Discov. Data.

[35]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[36]  Gyemin Lee,et al.  Nested Support Vector Machines , 2008, IEEE Transactions on Signal Processing.

[37]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  GhoshJoydeep,et al.  Bregman bubble clustering , 2008 .

[39]  Mu-Chun Su,et al.  A new approach to clustering data with arbitrary shapes , 2005, Pattern Recognit..

[40]  Paula Brito,et al.  A partitional clustering algorithm validated by a clustering tendency index based on graph theory , 2006, Pattern Recognit..

[41]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2013, Pattern Recognit..

[42]  Chein-I Chang,et al.  Random N-Finder (N-FINDR) Endmember Extraction Algorithms for Hyperspectral Imagery , 2011, IEEE Transactions on Image Processing.

[43]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[44]  Hildur Ólafsdóttir,et al.  Robust Pseudo-hierarchical Support Vector Clustering , 2007, SCIA.

[45]  Yi Wan,et al.  PHA: A fast potential-based hierarchical agglomerative clustering method , 2013, Pattern Recognit..

[46]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[48]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[49]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Shengrui Wang,et al.  An objective approach to cluster validation , 2006, Pattern Recognit. Lett..

[51]  Doheon Lee,et al.  On cluster validity index for estimation of the optimal number of fuzzy clusters , 2004, Pattern Recognit..

[52]  Alejandro Hinojosa-Corona,et al.  A Genetic Programming Approach to Estimate Vegetation Cover in the Context of Soil Erosion Assessment , 2011 .

[53]  Mohamed A. Ismail,et al.  A novel validity measure for clusters of arbitrary shapes and densities , 2008, 2008 19th International Conference on Pattern Recognition.

[54]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[55]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[56]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[57]  Noureddine Zahid,et al.  A new cluster-validity for fuzzy clustering , 1999, Pattern Recognit..

[58]  Alfonso Fernández-Manso,et al.  Spectral unmixing , 2012 .

[59]  Wang Jeen-Shing,et al.  A Cluster Validity Measure With Outlier Detection for Support Vector Clustering , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60]  Antonio J. Plaza,et al.  Survey of geometric and statistical unmixing algorithms for hyperspectral images , 2010, 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing.

[61]  Mohamed A. Ismail,et al.  A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities , 2009, Pattern Recognit..

[62]  Gyemin Lee,et al.  The One Class Support Vector Machine Solution Path , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[63]  Doheon Lee,et al.  Fuzzy cluster validation index based on inter-cluster proximity , 2003, Pattern Recognit. Lett..

[64]  Yi Wan,et al.  Clustering by Sorting Potential Values (CSPV): A novel potential-based clustering method , 2012, Pattern Recognit..

[65]  Jon Atli Benediktsson,et al.  Unsupervised methods for the classification of hyperspectral images with low spatial resolution , 2013, Pattern Recognit..