Integrate and Conquer

In this article, we introduce a novel, general methodology, called integrate and conquer, for simultaneously accomplishing the tasks of feature extraction, manifold construction, and clustering, which is taken to be superior to building a clustering method as a single task. When the proposed novel methodology is used on two-dimensional (2D) data, it naturally induces a new clustering method highly effective on 2D data. Existing clustering algorithms usually need to convert 2D data to vectors in a preprocessing step, which, unfortunately, severely damages 2D spatial information and omits inherent structures and correlations in the original data. The induced new clustering method can overcome the matrix-vectorization-related issues to enhance the clustering performance on 2D matrices. More specifically, the proposed methodology mutually enhances three tasks of finding subspaces, learning manifolds, and constructing data representation in a seamlessly integrated fashion. When used on 2D data, we seek two projection matrices with optimal numbers of directions to project the data into low-rank, noise-mitigated, and the most expressive subspaces, in which manifolds are adaptively updated according to the projections, and new data representation is built with respect to the projected data by accounting for nonlinearity via adaptive manifolds. Consequently, the learned subspaces and manifolds are clean and intrinsic, and the new data representation is discriminative and robust. Extensive experiments have been conducted and the results confirm the effectiveness of the proposed methodology and algorithm.

[1]  Feiping Nie,et al.  Robust Manifold Nonnegative Matrix Factorization , 2014, ACM Trans. Knowl. Discov. Data.

[2]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[3]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Zhao Kang,et al.  Nonnegative Matrix Factorization with Integrated Graph and Feature Learning , 2017, ACM Trans. Intell. Syst. Technol..

[6]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[7]  Shuicheng Yan,et al.  Latent Low-Rank Representation for subspace segmentation and feature extraction , 2011, 2011 International Conference on Computer Vision.

[8]  Mohamed Nadif,et al.  Simultaneous Semi-NMF and PCA for Clustering , 2015, 2015 IEEE International Conference on Data Mining.

[9]  P. Arabie,et al.  Cluster analysis in marketing research , 1994 .

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Ronen Basri,et al.  Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Uday V. Kulkarni,et al.  Hybrid personalized recommender system using centering-bunching based clustering algorithm , 2012, Expert Syst. Appl..

[13]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[14]  Ben Upcroft,et al.  Advantages of exploiting projection structure for segmenting dense 3D point clouds , 2013, ICRA 2013.

[15]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[16]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[17]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Junjie Wu,et al.  Traffic Speed Prediction and Congestion Source Exploration: A Deep Learning Method , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[20]  René Vidal,et al.  Latent Space Sparse Subspace Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[22]  Zhao Kang,et al.  Integrating feature and graph learning with low-rank representation , 2017, Neurocomputing.

[23]  Libor Spacek,et al.  Distinctive Descriptions for Face Processing , 1997, BMVC.

[24]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[25]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[26]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[27]  Zenglin Xu,et al.  Robust graph regularized nonnegative matrix factorization for clustering , 2017, Data Mining and Knowledge Discovery.

[28]  Roman Filipovych,et al.  Semi-supervised cluster analysis of imaging data , 2011, NeuroImage.

[29]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[30]  Alexander Kolesnikov,et al.  Estimating the number of clusters in a numerical data set via quantization error modeling , 2015, Pattern Recognit..

[31]  Zhao Kang,et al.  Kernel-driven similarity learning , 2017, Neurocomputing.

[32]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ming Yang,et al.  Feature Selection Embedded Subspace Clustering , 2016, IEEE Signal Processing Letters.

[34]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[35]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[36]  Xuelong Li,et al.  Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[38]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[39]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[40]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[41]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[42]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[43]  Jian Yang,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Jim Jing-Yan Wang,et al.  Feature selection and multi-kernel learning for sparse representation on a manifold , 2014, Neural Networks.

[47]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[48]  Chris H. Q. Ding,et al.  Hierarchical Ensemble Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[49]  Zhao Kang,et al.  Subspace Clustering Using Log-determinant Rank Approximation , 2015, KDD.

[50]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[51]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[53]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[54]  Daoqiang Zhang,et al.  (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition , 2005, Neurocomputing.