Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth

This paper proposes a new method called depth difference (DeD), for estimating the optimal number of clusters (k) in a dataset based on data depth. The DeD method estimates the k parameter before actual clustering is constructed. We define the depth within clusters, depth between clusters, and depth difference to finalize the optimal value of k, which is an input value for the clustering algorithm. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed DeD method outperforms.

[1]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[2]  Malika Charrad,et al.  NbClust package: finding the relevant number of clusters in a dataset , 2012 .

[3]  F. Marriott Practical problems in a method of cluster analysis. , 1971, Biometrics.

[4]  Marcin Kozak,et al.  “A Dendrite Method for Cluster Analysis” by Caliński and Harabasz: A Classical Work that is Far Too Often Incorrectly Cited , 2012 .

[5]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[6]  Pasi Fränti,et al.  K-means properties on six clustering benchmark datasets , 2018, Applied Intelligence.

[7]  Cun-Hui Zhang,et al.  The multivariate L1-median and associated data depth. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Regina Y. Liu,et al.  Regression depth. Commentaries. Rejoinder , 1999 .

[9]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[10]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[11]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[12]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[13]  W. Eddy Convex Hull Peeling , 1982 .

[14]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[15]  Regina Y. Liu On a Notion of Data Depth Based on Random Simplices , 1990 .

[16]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .