Density-Based Clustering with Kernel Diffusion

Finding a suitable density function is essential for density-based clustering algorithms such as DBSCAN and DPC. A naive density corresponding to the indicator function of a unit d-dimensional Euclidean ball is commonly used in these algorithms. Such density suffers from capturing local features in complex datasets. To tackle this issue, we propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness. Furthermore, we develop a surrogate that can be efficiently computed in linear time and space and prove that it is asymptotically equivalent to the kernel diffusion density function. Extensive empirical experiments on benchmark and large-scale face image datasets show that the proposed approach not only achieves a significant improvement over classic density-based clustering algorithms but also outperforms the state-of-the-art face clustering methods by a large margin.

[1]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[2]  M.H. Moradi,et al.  A density-based fuzzy clustering technique for non-destructive detection of defects in materials , 2007 .

[3]  A. Tramacere,et al.  γ-ray DBSCAN: A clustering algorithm applied to Fermi-LAT γ-ray data , 2012 .

[4]  Fabian J Theis,et al.  SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues , 2020, Cell.

[5]  W. Aisha Banu,et al.  An efficient method to detect communities in social networks using DBSCAN algorithm , 2019, Social Network Analysis and Mining.

[6]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[7]  Dahua Lin,et al.  Learning to Cluster Faces via Confidence and Connectivity Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[9]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[10]  Takeshi Toyama,et al.  On the Use of Density-Based Algorithms for the Analysis of Solute Clustering in Atom Probe Tomography Data , 2017 .

[11]  J. Marron,et al.  Transformations to reduce boundary bias in kernel density estimation , 1994 .

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[14]  Shengjin Wang,et al.  Linkage Based Face Clustering via Graph Convolution Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[16]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  Felice Pantaleo,et al.  CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics , 2020, Frontiers in Big Data.

[19]  Lei Yang,et al.  Learning to Cluster Faces on an Affinity Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kai Ming Ting,et al.  Density-ratio based clustering for discovering clusters with varying densities , 2016, Pattern Recognit..

[21]  Kai Ming Ting,et al.  Local contrast as an effective means to robust clustering against varying densities , 2017, Machine Learning.

[22]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[23]  Nikos Mamoulis,et al.  Density-based place clustering in geo-social networks , 2014, SIGMOD Conference.

[24]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[25]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[26]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Anil K. Jain,et al.  Clustering Millions of Faces by Identity , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[29]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[30]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[31]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[32]  J. Hunter Generalized inverses and their application to applied probability problems , 1982 .

[33]  BieRongfang,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016 .

[34]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[35]  Junjie Yan,et al.  Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition , 2018, ECCV.