Clustering with Confidence: A Binning Approach

We present a plug-in method for estimating the cluster tree of a density. The method takes advantage of the ability to exactly compute the level sets of a piecewise constant density estimate. We then introduce clustering with confidence, an automatic pruning procedure that assesses significance of splits (and thereby clusters) in the cluster tree; the only user input required is the desired confidence level.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[4]  J. Hartigan Statistical theory in clustering , 1985 .

[5]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[6]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[7]  Matt P. Wand,et al.  On the Accuracy of Binned Kernel Density Estimators , 1994 .

[8]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[9]  A. Cuevas,et al.  Estimating the number of clusters , 2000 .

[10]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[11]  Robert Sedgewick,et al.  Algorithms in C : Part 5 : Graph Algo-rithms , 2002 .

[12]  Robert Sedgewick,et al.  Algorithms in C++ - part 5: graph algorithms (3. ed.) , 2014 .

[13]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[14]  Jussi Klemelä,et al.  Visualization of Multivariate Density Estimates With Level Set Trees , 2004 .

[15]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[16]  Rebecca Nugent,et al.  Skill set profile clustering based on student capability vectors computed from online tutoring data , 2008 .

[17]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[18]  A. Buja,et al.  Calibration for Simultaneity : ( Re ) Sampling Methods for Simultaneous Inference with Applications to Function Estimation and Functional Data , 2022 .