Local contrast as an effective means to robust clustering against varying densities

Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP.

[1]  Tommy W. S. Chow,et al.  A new shifting grid clustering algorithm , 2004, Pattern Recognit..

[2]  Fawzy A. Torkey,et al.  An Enhanced Density Based Spatial clustering of Applications with Noise , 2009, DMIN.

[3]  Marco Laumanns,et al.  A Tutorial on Evolutionary Multiobjective Optimization , 2004, Metaheuristics for Multiobjective Optimisation.

[4]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[5]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[6]  Ira Assent,et al.  DUSC: Dimensionality Unbiased Subspace Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[8]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[9]  Dhruba Kumar Bhattacharyya,et al.  DDSC : A Density Differentiated Spatial Clustering Technique , 2008, J. Comput..

[10]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[11]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[13]  Stefano Ferilli,et al.  k-Nearest Neighbor Classification on First-Order Logic Descriptions , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[14]  Dimitrios Gunopulos,et al.  A clustering framework based on subjective and objective validity criteria , 2008, TKDD.

[15]  Kai Ming Ting,et al.  Density-ratio based clustering for discovering clusters with varying densities , 2016, Pattern Recognit..

[16]  Erich Schikuta,et al.  Grid-clustering: an efficient hierarchical clustering method for very large data sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[17]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[18]  L. Hubert,et al.  Comparing partitions , 1985 .

[19]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[22]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[23]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[24]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[25]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[26]  Tundong Liu,et al.  WSPT's Competitive Performance for Minimizing the Total Weighted Flow Time: From Single to Parallel Machines , 2013 .

[27]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[28]  Arthur Zimek,et al.  The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives , 2013, Machine Learning.

[29]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[30]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[31]  Thomas Seidl,et al.  k-Nearest Neighbor Classification , 2009, Encyclopedia of Database Systems.

[32]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[33]  Anil K. Jain Data Clustering: User's Dilemma , 2007, MLDM.

[34]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[35]  Ashish Sharma,et al.  An Enhanced Density Based Spatial Clustering of Applications with Noise , 2009, 2009 IEEE International Advance Computing Conference.

[36]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[37]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[38]  Rujing Wang,et al.  Smooth Splicing: A Robust SNN-Based Method for Clustering High-Dimensional Data , 2013 .

[39]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[40]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .