HIBOG: Improving the clustering accuracy by ameliorating dataset with gravitation

Abstract Clustering is an important technology applied in many fields. Most researchers focus on only clustering algorithms when they want more accurate results. However, this is not an optimal strategy because each algorithm has its unique advantages and disadvantages. Furthermore, a given algorithm cannot get satisfactory results on all datasets. In this paper, focusing on datasets, a method called HIBOG is proposed to improve the clustering accuracy by ameliorating datasets with gravitation. HIBOG can help many clustering algorithms acquire better results on more datasets by ameliorating datasets so that similar objects get closer and dissimilar objects separate further apart. As a result, ameliorated datasets are friendlier to many clustering algorithms than original datasets. Though datasets are diverse, HIBOG can cope with the diversity to some extent due to its robustness to high dimensional datasets, Gaussian distribution datasets, shaped datasets, and datasets with high overlap clusters. We have conducted numerous experiments on real-world datasets to verify the effectiveness of HIBOG. The experiments demonstrated that HIBOG successfully improves the accuracy of different clustering algorithms, and accuracy increases by an average of 113.4% (except maximum and minimum). Moreover, compared with other similar methods, HIBOG improves much higher clustering accuracy and dramatically shortens the running time. At the same time, we conducted 360 experiments, each of which selected different parameter values. The experiments show that most values enable HIBOG to ameliorate datasets, and HIBOG has strong robustness to the parameter adjustment.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[3]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[4]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[6]  Yue Li,et al.  Herd Clustering: A synergistic data clustering approach using collective intelligence , 2014, Appl. Soft Comput..

[7]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[8]  Ethem Alpaydin,et al.  Cascading classifiers , 1998, Kybernetika.

[9]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  William E. Wright,et al.  Gravitational clustering , 1977, Pattern Recognit..

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Rajen B. Bhatt,et al.  User Localization in an Indoor Environment Using Fuzzy Hybrid of Particle Swarm Optimization & Gravitational Search Algorithm with Neural Networks , 2016, SocProS.

[13]  Lianwen Jin,et al.  A New Simplified Gravitational Clustering Method for Multi-prototype Learning Based on Minimum Classification Error Training , 2006, IWICPAS.

[14]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[15]  Qi Li,et al.  δ‐Open set clustering—A new topological clustering method , 2018, WIREs Data Mining Knowl. Discov..

[16]  Yong Shi,et al.  A shrinking-based clustering approach for multidimensional data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Kun She,et al.  A Novel Hierarchical Clustering Approach Based on Universal Gravitation , 2020, Mathematical Problems in Engineering.

[18]  Masao Fukushima,et al.  Regularized nonsmooth Newton method for multi-class support vector machines , 2007, Optim. Methods Softw..

[19]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Volker Lohweg,et al.  Banknote authentication with mobile devices , 2013, Electronic Imaging.

[21]  Zhiqiang Wang,et al.  Clustering by Local Gravitation , 2018, IEEE Transactions on Cybernetics.

[22]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  Alexander A. Georgiev Functional Data Analysis , 1998 .

[24]  Yen-Jen Oyang,et al.  A statistics-based approach to control the quality of subclusters in incremental gravitational clustering , 2005, Pattern Recognit..

[25]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[26]  Meirav Galun,et al.  Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[27]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[29]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[30]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[31]  Isaac E. Lagaris,et al.  Newtonian clustering: An approach based on molecular dynamics and global optimization , 2007, Pattern Recognit..

[32]  Armen Aghajanyan,et al.  Gravitational Clustering , 2015, ArXiv.

[33]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[34]  Nor Ashidi Mat Isa,et al.  Optimized gravitational-based data clustering algorithm , 2018, Eng. Appl. Artif. Intell..

[35]  Xiaogang Wang,et al.  CLUES: A non-parametric clustering method based on local shrinking , 2007, Comput. Stat. Data Anal..

[36]  Jiang Xie,et al.  A local-gravitation-based method for the detection of outliers and boundary points , 2020, Knowl. Based Syst..

[37]  Jiang Xie,et al.  A density-core-based clustering algorithm with local resultant force , 2020, Soft Computing.

[38]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.