Content driven clustering algorithm combining density and distance functions

Abstract Density and distance based clustering are two distinct approaches to the same problem. In this contribution, a novel algorithm is presented in order to exploit the benefits of both approaches. This is achieved, not by combining those approaches into a single notion, but by utilizing the advantages of each one, depending on what each step of the algorithm aims to achieve. To be precise, the Window Density Function is utilized to provide regions of high density and hence a region of clusters or a part of a cluster. Affinity Propagation is, consequently, utilized to provide a group of clusters within such a region. Finally, these regions are merged to form actual clusters. The proposed methodology is tested on a variety of synthetic and real-life datasets. The algorithm presented in this contribution outperforms other well-known algorithms, with which it is compared to, in the majority of the datasets used.

[1]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[5]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[6]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[7]  Hui Xiong,et al.  Clustering Validation Measures , 2018, Data Clustering: Algorithms and Applications.

[8]  Michael K. Ng,et al.  Subspace clustering using affinity propagation , 2015, Pattern Recognit..

[9]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[10]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Xingguang Peng,et al.  Large-scale cooperative co-evolution using niching-based multi-modal optimization and adaptive fast clustering , 2017, Swarm Evol. Comput..

[13]  Maoguo Gong,et al.  Density-Sensitive Evolutionary Clustering , 2007, PAKDD.

[14]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Maoguo Gong,et al.  Quantum-behaved discrete multi-objective particle swarm optimization for complex network clustering , 2017, Pattern Recognit..

[17]  Dimitris K. Tasoulis,et al.  Parallelizing the Unsupervised k-Windows Clustering Algorithm , 2003, PPAM.

[18]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[19]  Hui Xiong,et al.  Fast affinity propagation clustering based on incomplete similarity matrix , 2017, Knowledge and Information Systems.

[20]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[21]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[22]  Dimitris K. Tasoulis,et al.  Oriented k-windows: A PCA driven clustering method , 2006, Advances in Web Intelligence and Data Mining.

[23]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[24]  Haris Vikalo,et al.  Semi-Supervised Affinity Propagation with Soft Instance-Level Constraints , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Akshay Krishnamurthy,et al.  A Hierarchical Algorithm for Extreme Clustering , 2017, KDD.

[26]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[27]  Sandra Paterlini,et al.  Differential evolution and particle swarm optimisation in partitional clustering , 2006, Comput. Stat. Data Anal..

[28]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[29]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[30]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .

[31]  Chonghui Guo,et al.  Incremental Affinity Propagation Clustering Based on Message Passing , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Avory Bryant,et al.  RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates , 2018, IEEE Transactions on Knowledge and Data Engineering.

[33]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Shuilong He,et al.  A novel intelligent method for bearing fault diagnosis based on affinity propagation clustering and adaptive feature selection , 2017, Knowl. Based Syst..

[35]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Wenjing Li,et al.  A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis , 2017, ISPRS Int. J. Geo Inf..

[37]  Haiqing Li,et al.  Adjustable preference affinity propagation clustering , 2017, Pattern Recognit. Lett..

[38]  Kai Ming Ting,et al.  Density-ratio based clustering for discovering clusters with varying densities , 2016, Pattern Recognit..

[39]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[40]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[41]  Fei Wang,et al.  Fast affinity propagation clustering: A multilevel approach , 2012, Pattern Recognit..

[42]  João Gama,et al.  An evolutionary algorithm for clustering data streams with a variable number of clusters , 2017, Expert Syst. Appl..

[43]  Mohamed A. Ismail,et al.  An efficient density based clustering algorithm for large databases , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[44]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[46]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[47]  Paul M. B. Vitányi,et al.  Author ' s personal copy A Fast Quartet tree heuristic for hierarchical clustering , 2010 .