A new algorithm for clustering based on kernel density estimation

ABSTRACT In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[6]  M. C. Jones,et al.  On optimal data-based bandwidth selection in kernel density estimation , 1991 .

[7]  A. Cuevas,et al.  A comparative study of several smoothing methods in density estimation , 1994 .

[8]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[9]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[10]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[13]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[14]  P. Hall,et al.  Data sharpening as a prelude to density estimation , 1999 .

[15]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[16]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[17]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[18]  S. Sheather Density Estimation , 2004 .

[19]  H. Wickham,et al.  Exploring cluster analysis , 2006 .

[20]  Ching-Fu Chen,et al.  A variable bandwidth selector in multivariate kernel density estimation , 2007 .

[21]  Nicola Torelli,et al.  Clustering via nonparametric density estimation , 2007, Stat. Comput..

[22]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[23]  Douglas G. Woolford,et al.  Convergent data sharpening for the identification and tracking of spatial temporal centers of lightning activity , 2007 .

[24]  Beniamino Murgante,et al.  Kernel Density Estimation Methods for a Geostatistical Approach in Seismic Risk Analysis: The Case Study of Potenza Hilltop Town (Southern Italy) , 2008, ICCSA.

[25]  Beniamino Murgante,et al.  Geostatistics in Historical Macroseismic Data Analysis , 2009, Trans. Comput. Sci..

[26]  Osmar Pinto,et al.  Improvements in the detection efficiency model for the Brazilian lightning detection network (BrasilDAT) , 2009 .

[27]  Glory H. Shah,et al.  An Empirical Evaluation of Density-Based Clustering Techniques , 2012 .

[28]  A. Gibbs Periodicities of Peak Current and Flash Multiplicity in Cloud to Ground Lightning , 2012 .

[29]  Ronaldo Dias,et al.  A Review of Kernel Density Estimation with Applications to Econometrics , 2012, 1212.2812.

[30]  Giovanna Menardi,et al.  An advancement in clustering via nonparametric density estimation , 2014, Stat. Comput..