High-dimensional data clustering by using local affine/convex hulls

Abstract In this paper we propose a novel clustering algorithm that uses local affine/convex hulls for high-dimensional data clustering. In high-dimensional spaces, the sparse and irregular distributions make the nearest-neighbor distances unreliable (hole artifact), and this deteriorates the clustering performance. Therefore, there is a need to fill in these gaps between the nearest samples. To this end, we use local affine/convex hulls of the nearest neighbors of a given sample to fill in the holes, and this greatly improves the Euclidean distance metric and the clustering accuracy. The proposed method can also be seen as the local extension of the well-known iterative subspace clustering algorithms in which the entire cluster is approximated with a single linear/affine subspace. Experimental results show that the proposed method is efficient and it outperforms other subspace clustering algorithms on a wide range of datasets.

[1]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[3]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[4]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[9]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[10]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[11]  René Vidal,et al.  Subspace Clustering , 2011, IEEE Signal Processing Magazine.

[12]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[13]  Junbin Gao,et al.  Subspace Clustering for Sequential Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[15]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[16]  Gilad Lerman,et al.  Median K-Flats for hybrid linear modeling with many outliers , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[17]  Hakan Cevikalp,et al.  Towards Large-Scale Face Recognition Based on Videos , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Jiwen Lu,et al.  Simultaneous Local Binary Feature Learning and Encoding for Homogeneous and Heterogeneous Face Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Gilad Lerman,et al.  Hybrid Linear Modeling via Local Best-Fit Flats , 2010, International Journal of Computer Vision.

[20]  Hakan Cevikalp,et al.  New clustering algorithms for the support vector machine based hierarchical classification , 2010, Pattern Recognit. Lett..

[21]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[22]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[23]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[24]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[25]  Jiwen Lu,et al.  Learning Deep Binary Descriptor with Multi-Quantization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Hakan Cevikalp,et al.  Nearest hyperdisk methods for high-dimensional classification , 2008, ICML '08.

[29]  Allen Y. Yang,et al.  Robust Statistical Estimation and Segmentation of Multiple Subspaces , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[30]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[31]  Jennifer G. Dy,et al.  A hierarchical method for multi-class support vector machines , 2004, ICML.

[32]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[33]  Y. Weiss,et al.  Multibody factorization with uncertainty and missing data using the EM algorithm , 2004, CVPR 2004.

[34]  Jiwen Lu,et al.  Context-Aware Local Binary Feature Learning for Face Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[36]  Hakan Cevikalp,et al.  Manifold Based Local Classifiers: Linear and Nonlinear Approaches , 2010, J. Signal Process. Syst..