Enhancing subspace clustering based on dynamic prediction

In high dimensional data, many dimensions are irrelevant to each other and clusters are usually hidden under noise. As an important extension of the traditional clustering, subspace clustering can be utilized to simultaneously cluster the high dimensional data into several subspaces and associate the low-dimensional subspaces with the corresponding points. In subspace clustering, it is a crucial step to construct an affinity matrix with block-diagonal form, in which the blocks correspond to different clusters. The distance-based methods and the representation-based methods are two major types of approaches for building an informative affinity matrix. In general, it is the difference between the density inside and outside the blocks that determines the efficiency and accuracy of the clustering. In this work, we introduce a well-known approach in statistic physics method, namely link prediction, to enhance subspace clustering by reinforcing the affinity matrix.More importantly,we introduce the idea to combine complex network theory with machine learning. By revealing the hidden links inside each block, we maximize the density of each block along the diagonal, while restrain the remaining non-blocks in the affinity matrix as sparse as possible. Our method has been shown to have a remarkably improved clustering accuracy comparing with the existing methods on well-known datasets.

[1]  Hong Cheng,et al.  Link prediction via matrix completion , 2016, ArXiv.

[2]  Jie Tang,et al.  Link Prediction of Social Networks Based on Weighted Proximity Measures , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[3]  Zhixun Su,et al.  Fixed-rank representation for unsupervised visual learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Roger Guimerà,et al.  Correction for Sales-Pardo et al., Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences of the United States of America.

[5]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[6]  Shuicheng Yan,et al.  Robust Subspace Segmentation with Block-Diagonal Prior , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[10]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[11]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[14]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[15]  Shuicheng Yan,et al.  Latent Low-Rank Representation for subspace segmentation and feature extraction , 2011, 2011 International Conference on Computer Vision.

[16]  G. S. Watson,et al.  Linear relationships between variables affected by errors. , 1966, Biometrics.

[17]  Sarajane Marques Peres,et al.  Gesture unit segmentation using support vector machines: segmenting gestures from rest positions , 2013, SAC '13.

[18]  Zhang Yi,et al.  A Novel Low Rank Representation Algorithm for Subspace Clustering , 2016, Int. J. Pattern Recognit. Artif. Intell..

[19]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[20]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[21]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[22]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[23]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[24]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[26]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[27]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[29]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[31]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[32]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[34]  Sid Redner,et al.  Networks: Teasing out the missing links , 2008, Nature.

[35]  T. Kanade,et al.  A multi-body factorization method for motion analysis , 1995, ICCV 1995.

[36]  Arvind Ganesh,et al.  Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .

[37]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[38]  Zhang Yi,et al.  fLRR: fast low-rank representation using Frobenius-norm , 2014 .

[39]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[40]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[41]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[43]  Zhang Yi,et al.  Constructing L2-Graph For Subspace Learning and Segmentation , 2012, ArXiv.

[44]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[45]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[46]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[47]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[48]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[49]  Wei-Ying Ma,et al.  Locality preserving clustering for image database , 2004, MULTIMEDIA '04.

[50]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[52]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[53]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[54]  Yizhou Yu,et al.  Subspace segmentation with a Minimal Squared Frobenius Norm Representation , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[55]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..