LG: A clustering framework supported by point proximity relations

Abstract Clustering is a research problem based on the data's proximity relationship which is not made full use of by all the existing algorithms. In this paper, we present a novel two-stage LG framework consisting of the proposed Local Energy Gradient Oppression (LEGO) and the Guide Point Assignation (GPA) strategies which are closely related to the data points’ proximity relations. In the LG framework, it is crucial to locate the appropriate centers for the subsequent data label assignment, and therefore we introduce the nuclear model viewing the dataset as a collection of charged particles, which is the basis of LEGO, and the points with local maximum potential energy are ascertained as the cluster centers. Besides, the GPA strategy innovatively adopts the idea that the cluster center actively selects data points as the same cluster, enabling the LG framework still to be effective when dealing with datasets of arbitrary shape distribution. Superiorities of the proposed framework and the two strategies are demonstrated on four synthetic datasets and three real-world faces image datasets in terms of two clustering performance metrics.

[1]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[2]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[3]  Jie Zhang,et al.  Online Low-Rank Representation Learning for Joint Multi-Subspace Recovery and Clustering , 2018, IEEE Transactions on Image Processing.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[7]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[8]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[11]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[12]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[13]  Zhibin Wu,et al.  A consensus model for large-scale group decision making with hesitant fuzzy information and changeable clusters , 2018, Inf. Fusion.

[14]  Kenichi Kanatani,et al.  Geometric Structure of Degeneracy for Multi-body Motion Segmentation , 2004, ECCV Workshop SMVP.

[15]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Gilad Lerman,et al.  Hybrid Linear Modeling via Local Best-Fit Flats , 2010, International Journal of Computer Vision.

[18]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[19]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[20]  C. W. Gear,et al.  Multibody Grouping from Motion Images , 1998, International Journal of Computer Vision.

[21]  Chang-Tsun Li Large-Scale Image Clustering Based on Camera Fingerprints , 2017, IEEE Transactions on Information Forensics and Security.

[22]  Kenichi Kanatani,et al.  Motion segmentation by subspace separation and model selection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[25]  Moinak Bhaduri,et al.  Using Empirical Recurrence Rates Ratio for Time Series Data Similarity , 2018, IEEE Access.

[26]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[27]  Zhao Kang,et al.  Subspace Clustering via Variance Regularized Ridge Regression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[30]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[31]  René Vidal,et al.  Segmenting Motions of Different Types by Unsupervised Manifold Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[33]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[34]  Gilad Lerman,et al.  Median K-Flats for hybrid linear modeling with many outliers , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[35]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[36]  Xiuping Jia,et al.  Segment-Oriented Depiction and Analysis for Hyperspectral Image Data , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[37]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[38]  Kang Sun,et al.  Exemplar Component Analysis: A Fast Band Selection Method for Hyperspectral Imagery , 2015, IEEE Geoscience and Remote Sensing Letters.

[39]  Joong-Ho Won,et al.  HiComet: a high-throughput comet analysis tool for large-scale DNA damage assessment , 2018, BMC Bioinformatics.

[40]  P. Tseng Nearest q-Flat to m Points , 2000 .

[41]  Lihi Zelnik-Manor,et al.  Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[42]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[43]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[44]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[45]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[46]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[47]  Brian Everitt,et al.  Cluster analysis , 1974 .

[48]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[49]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[50]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[51]  Xiuping Jia,et al.  Band Dual Density Discrimination Analysis for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[52]  René Vidal,et al.  Clustering disjoint subspaces via sparse representation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Y. Weiss,et al.  Multibody factorization with uncertainty and missing data using the EM algorithm , 2004, CVPR 2004.

[54]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[55]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.