Clustering via nonparametric density estimation

Although Hartigan (1975) had already put forward the idea of connecting identification of subpopulations with regions with high density of the underlying probability distribution, the actual development of methods for cluster analysis has largely shifted towards other directions, for computational convenience. Current computational resources allow us to reconsider this formulation and to develop clustering techniques directly in order to identify local modes of the density. Given a set of observations, a nonparametric estimate of the underlying density function is constructed, and subsets of points with high density are formed through suitable manipulation of the associated Delaunay triangulation. The method is illustrated with some numerical examples.

[1]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[2]  John M. Chambers,et al.  Programming With Data , 1998 .

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  E. Nadaraya On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[7]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[8]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[9]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[10]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[11]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[12]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[13]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[14]  S. Hu THE STRONG UNIFORM CONSISTENCY OF KERNEL DENSITY ESTIMATES FOR φ—MIXING SAMPLE , 1993 .

[15]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[16]  Adrian Bowman,et al.  Density based exploration of bivariate data , 1993 .

[17]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[18]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[19]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[20]  L. Hubert,et al.  Comparing partitions , 1985 .

[21]  A. Cuevas,et al.  Estimating the number of clusters , 2000 .