Consistency and Rates for Clustering with DBSCAN

We propose a simple and efficient modification of the popular DBSCAN clustering algorithm. This modification is able to detect the most interesting vertical threshold level in an automated, data-driven way. We establish both consistency and optimal learning rates for this modification.

[1]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[2]  Sanjoy Dasgupta,et al.  Rates of convergence for the cluster tree , 2010, NIPS.

[3]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[4]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[5]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[6]  J. Carmichael,et al.  FINDING NATURAL CLUSTERS , 1968 .

[7]  W. Stuetzle,et al.  A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density , 2010 .

[8]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[9]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[10]  Ingo Steinwart,et al.  Adaptive Density Level Set Clustering , 2011, COLT.

[11]  Ulrike von Luxburg,et al.  Pruning nearest neighbor cluster trees , 2011, ICML.

[12]  E. Giné,et al.  On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals , 2001 .

[13]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[14]  Ulrike von Luxburg,et al.  Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters , 2009, Theor. Comput. Sci..

[15]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[16]  A. Rinaldo,et al.  Generalized density clustering , 2009, 0907.3454.

[17]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[20]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[21]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..