Curvature-based method for determining the number of clusters

Abstract Determining the number of clusters is one of the research questions attracting considerable interests in recent years. Majority of the existing methods require parametric assumptions and substantiated computations. In this paper we propose a simple yet powerful method for determining the number of clusters based on curvature. Our technique is computationally efficient and straightforward to implement. We compare our method with 6 other approaches on a wide range of simulated and real-world datasets. Theoretical motivation underlying the proposed method is also presented.

[1]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[2]  J. Jossinet Variability of impedivity in normal and pathological breast tissue , 1996, Medical and Biological Engineering and Computing.

[3]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[4]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[5]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[6]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[7]  Jie Yang,et al.  Posterior Distribution Learning (PDL): A novel supervised learning framework using unlabeled samples to improve classification performance , 2015, Neurocomputing.

[8]  Longbing Cao,et al.  A novel graph-based k-means for nonlinear manifold clustering and representative selection , 2014, Neurocomputing.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Petr Savický,et al.  Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope , 2004 .

[11]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.

[12]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[13]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[14]  Piotr A. Kowalski,et al.  Complete Gradient Clustering Algorithm for Features Analysis of X-Ray Images , 2010 .

[15]  F. Marriott Practical problems in a method of cluster analysis. , 1971, Biometrics.

[16]  Mark J. van der Laan,et al.  A Method to Identify Significant Clusters in Gene Expression Data , 2002 .

[17]  Kadim Tasdemir,et al.  Topology-Based Hierarchical Clustering of Self-Organizing Maps , 2011, IEEE Transactions on Neural Networks.

[18]  Santanu Chaudhury,et al.  Efficient Skin Region Segmentation Using Low Complexity Fuzzy Decision Tree Model , 2009, 2009 Annual IEEE India Conference.

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[21]  Wei Fu,et al.  Estimating the Number of Clusters Using Cross-Validation , 2017, Journal of Computational and Graphical Statistics.

[22]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[23]  Lukasz A. Kurgan,et al.  Knowledge discovery approach to automated cardiac SPECT diagnosis , 2001, Artif. Intell. Medicine.

[24]  D. Ayres-de- Campos,et al.  SisPorto 2.0: a program for automated analysis of cardiotocograms. , 2000, The Journal of maternal-fetal medicine.

[25]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[26]  Lin Zhu,et al.  A graph-based semi-supervised k nearest-neighbor method for nonlinear manifold distributed data classification , 2016, Inf. Sci..

[27]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[28]  Boris G. Mirkin,et al.  Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads , 2010, J. Classif..

[29]  Siegfried Piepenbrock,et al.  In vivo myograph measurement of muscle contraction at optimal length , 2007, Biomedical engineering online.

[30]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[31]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[32]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[33]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .