Clustering with Local Density Peaks-Based Minimum Spanning Tree

Clustering analysis has been widely used in statistics, machine learning, pattern recognition, image processing, and so on. It is a great challenge for most existing clustering algorithms to discover clusters with arbitrary shapes. Clustering algorithms based on Minimum spanning tree (MST) are able to discover clusters with arbitrary shapes, but they are time consuming and susceptible to noise points. In this paper, we employ local density peaks (LDP) to represent the whole data set and define a shared neighbors-based distance between local density peaks to better measure the dissimilarity between objects on manifold data. On the basis of local density peaks and the new distance, we propose a novel MST-based clustering algorithm called LDP-MST. It first uses local density peaks to construct MST and then repeatedly cuts the longest edge until a given number of clusters are found. The experimental results on synthetic data sets and real data sets show that our algorithm is competent with state-of-the-art methods when discovering clusters with complex structures.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Matteo Dell'Amico,et al.  NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data , 2016, Proc. VLDB Endow..

[3]  Qingsheng Zhu,et al.  Adaptive edited natural neighbor algorithm , 2017, Neurocomputing.

[4]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[7]  Roberto Trasarti,et al.  TOSCA: two-steps clustering algorithm for personal locations detection , 2015, SIGSPATIAL/GIS.

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  Duoqian Miao,et al.  A graph-theoretical clustering method based on two rounds of minimum spanning trees , 2010, Pattern Recognit..

[10]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[12]  Xia Li Wang,et al.  Enhancing minimum spanning tree-based clustering by removing density-based outliers , 2013, Digit. Signal Process..

[13]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[14]  Gerhard X. Ritter,et al.  A simple statistics-based nearest neighbor cluster detection algorithm , 2015, Pattern Recognit..

[15]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[16]  Ken C. K. Lee,et al.  Ranked Reverse Nearest Neighbor Search , 2008, IEEE Transactions on Knowledge and Data Engineering.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[20]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[21]  Cheng Wang,et al.  Decentralized Clustering by Finding Loose and Distributed Density Cores , 2018, Inf. Sci..

[22]  Marimuthu Palaniswami,et al.  Scalable single linkage hierarchical clustering for big data , 2013, 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[23]  Qingsheng Zhu,et al.  Spectral clustering with density sensitive similarity function , 2011, Knowl. Based Syst..

[24]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[25]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[26]  Ting Luo,et al.  A multi-prototype clustering algorithm based on minimum spanning tree , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[27]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[28]  R. Prim Shortest connection networks and some generalizations , 1957 .

[29]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  Qingsheng Zhu,et al.  A Novel Cluster Validity Index Based on Local Cores , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[32]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[33]  Keiichi Tamura,et al.  Cell-Based DBSCAN Algorithm Using Minimum Bounding Rectangle Criteria , 2017, DASFAA Workshops.

[34]  Jae-Gil Lee,et al.  RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning , 2018, SIGMOD Conference.

[35]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[36]  Ira Assent,et al.  AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets , 2016, KDD.

[37]  D. Mitchell Wilkes,et al.  A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering , 2009, IEEE Transactions on Knowledge and Data Engineering.

[38]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[39]  C. A. Murthy,et al.  Minimal spanning tree based clustering technique: Relationship with Bayes Classifier , 1997, Pattern Recognit..

[40]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[41]  Pasi Fränti,et al.  Minimum spanning tree based split-and-merge: A hierarchical clustering method , 2011, Inf. Sci..

[42]  Longbing Cao,et al.  A novel graph-based k-means for nonlinear manifold clustering and representative selection , 2014, Neurocomputing.

[43]  Marimuthu Palaniswami,et al.  A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data , 2019, IEEE Transactions on Knowledge and Data Engineering.

[44]  Niina Päivinen Clustering with a minimum spanning tree of scale-free-like structure , 2005, Pattern Recognit. Lett..

[45]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.