A density invariant approach to clustering

Abstract Organizing data into sensible groups is called as ‘data clustering.’ It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding ‘natural groupings’ or clusters in the data set is presented. For this, a new term ‘vicinity’ is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster’s boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed ‘Vicinity Based Cluster Detection’ (VBCD) algorithm is presented. (3) New internal evaluation index called ‘Space-Density Index’ (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of ‘natural groupings’ better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.

[1]  Zhenan Sun,et al.  Robust Subspace Clustering With Complex Noise , 2015, IEEE Transactions on Image Processing.

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  Menggang Li,et al.  Integrated constraint based clustering algorithm for high dimensional data , 2014, Neurocomputing.

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  Amr Tolba,et al.  A new fuzzy C-means method for magnetic resonance image brain segmentation , 2015, Connect. Sci..

[6]  Hadi Sadoghi Yazdi,et al.  Model-based fuzzy c-shells clustering , 2011, Neural Computing and Applications.

[7]  R. J. Kuo,et al.  Automatic kernel clustering with bee colony optimization algorithm , 2014, Inf. Sci..

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  Swarup Roy,et al.  An Approach to Find Embedded Clusters Using Density Based Techniques , 2005, ICDCIT.

[10]  Jian-Ping Mei,et al.  Incremental Fuzzy Clustering With Multiple Medoids for Large Data , 2014, IEEE Transactions on Fuzzy Systems.

[11]  Jing J. Liang,et al.  Hybrid Bacterial Foraging Algorithm for Data Clustering , 2013, IDEAL.

[12]  Tülin Inkaya,et al.  A parameter-free similarity graph for spectral clustering , 2015, Expert Syst. Appl..

[13]  Xiao-Jun Zeng,et al.  Fuzzy C-means++: Fuzzy C-means with effective seeding initialization , 2015, Expert Syst. Appl..

[14]  Hong Peng,et al.  An automatic clustering algorithm inspired by membrane computing , 2015, Pattern Recognit. Lett..

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[17]  Dechang Pi,et al.  A Cluster Validity Index for Fuzzy Clustering Based on Non-distance , 2013, 2013 International Conference on Computational and Information Sciences.

[18]  A. Ardeshir Goshtasby Image Registration: Principles, Tools and Methods , 2012 .

[19]  Xiaodong Feng,et al.  Spectral Clustering Algorithm Based on Local Sparse Representation , 2013, IDEAL.

[20]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[22]  Leandro Nunes de Castro,et al.  Clustering Algorithm Recommendation: A Meta-learning Approach , 2012, SEMCCO.

[23]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[24]  Javier de Lope Asiaín,et al.  Data clustering using a linear cellular automata-based algorithm , 2013, Neurocomputing.

[25]  Alireza Bayestehtashk,et al.  Nonlinear subspace clustering using curvature constrained distances , 2015, Pattern Recognit. Lett..

[26]  J. V. Ness,et al.  Admissible clustering procedures , 1971 .

[27]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[29]  Ronnie Johansson,et al.  Choosing DBSCAN Parameters Automatically using Differential Evolution , 2014 .

[30]  Rifat Edizkan,et al.  Use of wavelet-based two-dimensional scaling moments and structural features in cascade neuro-fuzzy classifiers for handwritten digit recognition , 2014, Neural Computing and Applications.

[31]  Carl Dean Meyer,et al.  Stochastic Data Clustering , 2010, SIAM J. Matrix Anal. Appl..

[32]  Hewayda M. Lotfy,et al.  A multi-agent-based approach for fuzzy clustering of large image data , 2018, Journal of Real-Time Image Processing.

[33]  Tengfei Liu,et al.  Latent tree models for rounding in spectral clustering , 2014, Neurocomputing.

[34]  Beatriz de la Iglesia,et al.  Experimental evaluation of cluster quality measures , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).