An incremental density-based clustering framework using fuzzy local clustering

Abstract This paper presents a novel incremental density-based clustering framework using the one-pass scheme, named Fuzzy Incremental Density-based Clustering (FIDC). Employing one-pass clustering in which each data point is processed once and discarded, FIDC can process large datasets with less computation time and memory, compared to its density-based clustering counterparts. Fuzzy local clustering is employed in local clusters assignment process to reduce clustering inconsistencies from one-pass clustering. To improve the clustering performance and simplify the parameter choosing process, the modified valley seeking algorithm is used to adaptively determine the outlier thresholds for generating the final clusters. FIDC can operate in both traditional and stream data clustering. The experimental results show that FIDC outperforms state-of-the-art algorithms in both clustering modes.

[1]  Rong Zheng,et al.  RECOME: a New Density-Based Clustering Algorithm Using Relative KNN Kernel Density , 2016, Inf. Sci..

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[6]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[7]  L. Hubert,et al.  Comparing partitions , 1985 .

[8]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[9]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[10]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[11]  Ashish Ghosh,et al.  Fuzzy clustering algorithms for unsupervised change detection in remote sensing images , 2011, Inf. Sci..

[12]  Maoguo Gong,et al.  A novel edge-weight based fuzzy clustering method for change detection in SAR images , 2018, Inf. Sci..

[13]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Shuyuan Yang,et al.  Feature selection based dual-graph sparse non-negative matrix factorization for local discriminative clustering , 2018, Neurocomputing.

[16]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[17]  Ronghua Shang,et al.  A Spatial Fuzzy Clustering Algorithm With Kernel Metric Based on Immune Clone for SAR Image Segmentation , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[18]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[19]  Ronghua Shang,et al.  Subspace learning-based graph regularized feature selection , 2016, Knowl. Based Syst..

[20]  Shuyuan Yang,et al.  Global discriminative-based nonnegative spectral clustering , 2016, Pattern Recognit..

[21]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[22]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[23]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[24]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[25]  Suphakant Phimoltares,et al.  A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction , 2017, Inf. Sci..

[26]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  Ronghua Shang,et al.  Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection , 2020, Knowl. Based Syst..

[29]  Suphakant Phimoltares,et al.  Hyper-cylindrical micro-clustering for streaming data with unscheduled data removals , 2016, Knowl. Based Syst..