A New Clustering Algorithm by Using Boundary Information

In view of the shortcomings that many clustering algorithms such as K-means clustering algorithm are not suitable for the non-convex dataset and the Affinity Propagation (AP) algorithm may cluster two adjacent different class points into one class, we proposed a new clustering algorithm by using boundary information. The idea of the proposed algorithm in this paper is as follows: First, use the number of points contained in each point's neighborhood as its density, and consider the points whose density are below the average density as boundary points. Then, count the number of boundary points. If the number of boundary points is larger than a given threshold then clustering is carried out by transfer ideas directly, otherwise boundary points will be regarded as the cluster boundary wall. When the boundary points are encountered in the transitive clustering process, the transfer stopped and selected an unprocessed non-boundary point to start clustering process as above again until all non-boundary points are processed, so as to effectively prevent clustering two adjacent different class points into one class. Because of the clustering of transfer idea, the proposed algorithm is applicable to nonconvex datasets, and different clustering schemes are adopted according to the number of boundary points which increases the applicability of the algorithm. Experimental results on synthetic datasets and standard datasets show that the algorithm proposed in this paper is efficient.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Zongben Xu,et al.  A heuristic hierarchical clustering based on multiple similarity measurements , 2013, Pattern Recognit. Lett..

[3]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[4]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[5]  Hoda Mashayekhi,et al.  GDCluster: A General Decentralized Clustering Algorithm , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Yin Cheng-xian An Improved K-Means Clustering Algorithm , 2014 .

[7]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[8]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[9]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[10]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[11]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[12]  Chunguang Li,et al.  Distributed Information Theoretic Clustering , 2014, IEEE Transactions on Signal Processing.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[15]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Ira Assent,et al.  AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets , 2016, KDD.

[17]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[20]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .