A Fast Algorithm for Identifying Density-Based Clustering Structures Using a Constraint Graph

OPTICS is a state-of-the-art algorithm for visualizing density-based clustering structures of multi-dimensional datasets. However, OPTICS requires iterative distance computations for all objects and is thus computed in O ( n 2 ) time, making it unsuitable for massive datasets. In this paper, we propose constrained OPTICS (C-OPTICS) to quickly create density-based clustering structures that are identical to those by OPTICS. C-OPTICS uses a bi-directional graph structure, which we refer to as the constraint graph, to reduce unnecessary distance computations of OPTICS. Thus, C-OPTICS achieves a good running time to create density-based clustering structures. Through experimental evaluations with synthetic and real datasets, C-OPTICS significantly improves the running time in comparison to existing algorithms, such as OPTICS, DeLi-Clu, and Speedy OPTICS (SOPTICS), and guarantees the quality of the density-based clustering structures.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[3]  Jian Guo,et al.  A Clustering Algorithm for Heterogeneous Wireless Sensor Networks Based on Solar Energy Supply , 2018 .

[4]  Anikó Vágner,et al.  The GridOPTICS clustering algorithm , 2016, Intell. Data Anal..

[5]  Ioannis P. Panapakidis,et al.  Implementation of Pattern Recognition Algorithms in Processing Incomplete Wind Speed Data for Energy Assessment of Offshore Wind Turbines , 2019, Electronics.

[6]  Zhengqin Li,et al.  Superpixel segmentation using Linear Spectral Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ting Zhang,et al.  An Oscillatory Neural Network Based Local Processing Unit for Pattern Recognition Applications , 2019, Electronics.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Wei-keng Liao,et al.  Scalable parallel OPTICS data clustering using graph algorithmic techniques , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[11]  Hans-Peter Kriegel,et al.  Multi-step density-based clustering , 2005, Knowledge and Information Systems.

[12]  Elke Achtert,et al.  DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking , 2006, PAKDD.

[13]  Chi-Hoon Lee,et al.  On Data Clustering Analysis: Scalability, Constraints, and Validation , 2002, PAKDD.

[14]  Gustavo M. Callico,et al.  Parallel K-Means Clustering for Brain Cancer Detection Using Hyperspectral Images , 2018, Electronics.

[15]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[16]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[17]  Chengdong Wu,et al.  Superpixel Segmentation Using Weighted Coplanar Feature Clustering on RGBD Images , 2018 .

[18]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[19]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[20]  Nikolai F. Rulkov,et al.  Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring , 2016, ArXiv.

[21]  Zhiqiang Wang,et al.  Clustering by Local Gravitation , 2018, IEEE Transactions on Cybernetics.

[22]  Zhengming Ma,et al.  Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy , 2017, Knowl. Based Syst..

[23]  L. Hubert,et al.  Comparing partitions , 1985 .

[24]  Avory Bryant,et al.  RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates , 2018, IEEE Transactions on Knowledge and Data Engineering.

[25]  A Ade Gunawan,et al.  A faster algorithm for DBSCAN , 2013 .

[26]  Hans-Peter Kriegel,et al.  Fast Hierarchical Clustering Based on Compressed Data and OPTICS , 2000, PKDD.

[27]  Tat-Chee Wan,et al.  Variants of the Low-Energy Adaptive Clustering Hierarchy Protocol: Survey, Issues and Challenges , 2018, Electronics.

[28]  Wookey Lee,et al.  G-OPTICS: fast ordering density-based cluster objects using graphics processing units , 2018, Int. J. Web Grid Serv..

[29]  J. Hartigan,et al.  The runt test for multimodality , 1992 .

[30]  Johannes Schneider,et al.  Scalable density-based clustering with quality guarantees using random projections , 2017, Data Mining and Knowledge Discovery.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.