A robust density-based clustering algorithm for multi-manifold structure

In real-world pattern recognition tasks, the data with multiple manifolds structure is ubiquitous and unpredictable. Performing an effective clustering on such data is a challenging problem. In particular, it is not obvious how to design a similarity measure for multiple manifolds. In this paper, we address this problem proposing a new manifold distance measure, which can better capture both local and global spatial manifold information. We define a new way of local density estimation accounting for the density characteristic. It represents local density more accurately. Meanwhile, it is less sensitive to the parameter settings. Besides, in order to select the cluster centers automatically, a two-phase exemplar determination method is proposed. The experiments on several synthetic and real-world datasets show that the proposed algorithm has higher clustering effectiveness and better robustness for data with varying density, multi-scale and noise overlap characteristics.

[1]  Bill McKelvey,et al.  Power law distributions in entrepreneurship: : Implications for theory and research , 2015 .

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[6]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[7]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[8]  Ulrike von Luxburg,et al.  Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters , 2009, Theoretical Computer Science.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Bo Jiang,et al.  Image Matching Using Mutual k-Nearest Neighbor Graph , 2015, ICYCSEE.

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[12]  Hong Zhao,et al.  Test-cost-sensitive attribute reduction of data with normal distribution measurement errors , 2012, ArXiv.

[13]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[14]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[15]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[16]  Timothy M. Chan More Algorithms for All-Pairs Shortest Paths in Weighted Graphs , 2010, SIAM J. Comput..

[17]  Gérard G. Medioni,et al.  Robust Multiple Manifolds Structure Learning , 2012, ICML 2012.

[18]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[19]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[20]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.