Stability of density-based clustering

High density clusters can be characterized by the connected components of a level set L(λ) = {x : p(x) > λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T = ∪λ L(λ). In this paper, we study the behavior of a density level set estimate L(λ) and cluster tree estimate T based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L(λ) and T as a function of h, and investigate the theoretical properties of these instability measures.

[1]  Christopher H Jackson,et al.  Displaying Uncertainty With Shading , 2008 .

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[4]  John H. J. Einmahl,et al.  The almost sure behavior of maximal and minimal multivariate kn-spacings , 1988 .

[5]  J. Marron,et al.  SCALE SPACE VIEW OF CURVE ESTIMATION , 2000 .

[6]  Ulrike von Luxburg,et al.  Pruning nearest neighbor cluster trees , 2011, ICML.

[7]  Mathew D. Penrose,et al.  Random Geometric Graphs , 2003 .

[8]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[9]  Sanjoy Dasgupta,et al.  Rates of convergence for the cluster tree , 2010, NIPS.

[10]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[11]  A. Rinaldo,et al.  Generalized density clustering , 2009, 0907.3454.

[12]  Wenceslao González-Manteiga,et al.  PLUG‐IN ESTIMATION OF GENERAL LEVEL SETS , 2006 .

[13]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[14]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[15]  Ingo Steinwart,et al.  Adaptive Density Level Set Clustering , 2011, COLT.

[16]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[17]  Robert D. Nowak,et al.  Adaptive Hausdorff Estimation of Density Level Sets , 2009, COLT.

[18]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[19]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[20]  A. Cuevas,et al.  On boundary estimation , 2004, Advances in Applied Probability.

[21]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[22]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Bruno Pelletier,et al.  Clustering by estimation of density level sets at a fixed probability , 2009 .

[24]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[25]  Larry Wasserman,et al.  All of Statistics , 2004 .

[26]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[27]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[28]  M. Wand Fast Computation of Multivariate Kernel Estimators , 1994 .

[29]  W. Stuetzle,et al.  A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density , 2010 .

[30]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[31]  Facundo Mémoli,et al.  Characterization, Stability and Convergence of Hierarchical Clustering Methods , 2010, J. Mach. Learn. Res..

[32]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .