Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data

In many real-world applications, data are represented by high-dimensional features. Despite the simplicity, existing K-means subspace clustering algorithms often employ eigenvalue decomposition to generate an approximate solution, which makes the model less efficiency. Besides, their loss functions are either sensitive to outliers or small loss errors. In this paper, we propose a fast adaptive K-means (FAKM) type subspace clustering model, where an adaptive loss function is designed to provide a flexible cluster indicator calculation mechanism, thereby suitable for datasets under different distributions. To find the optimal feature subset, FAKM performs clustering and feature selection simultaneously without the eigenvalue decomposition, therefore efficient for real-world applications. We exploit an efficient alternative optimization algorithm to solve the proposed model, together with theoretical analyses on its convergence and computational complexity. Finally, extensive experiments on several benchmark datasets demonstrate the advantages of FAKM compared to state-of-the-art clustering algorithms.

[1]  Feiping Nie,et al.  Orthogonal vs. uncorrelated least squares discriminant analysis for feature extraction , 2012, Pattern Recognit. Lett..

[2]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[3]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[4]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[5]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[6]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[8]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[9]  Rung-Ching Chen,et al.  Semi-supervised feature selection with exploiting shared information among multiple tasks , 2016, J. Vis. Commun. Image Represent..

[10]  Yi Yang,et al.  A Convex Formulation for Spectral Shrunk Clustering , 2015, AAAI.

[11]  Christos Boutsidis,et al.  Deterministic Feature Selection for K-Means Clustering , 2011, IEEE Transactions on Information Theory.

[12]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[13]  Chengqi Zhang,et al.  Convex Sparse PCA for Unsupervised Feature Learning , 2014, ACM Trans. Knowl. Discov. Data.

[14]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Lei Shi,et al.  Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[17]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[18]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[19]  Xuelong Li,et al.  Unsupervised Feature Selection with Structured Graph Optimization , 2016, AAAI.

[20]  Yi Wu,et al.  Stable local dimensionality reduction approaches , 2009, Pattern Recognit..

[21]  Feiping Nie,et al.  Re-Weighted Discriminatively Embedded $K$ -Means for Multi-View Clustering , 2017, IEEE Transactions on Image Processing.

[22]  Qingyao Wu,et al.  Supervised Feature Selection With a Stratified Feature Weighting Method , 2018, IEEE Access.

[23]  Rong Wang,et al.  Robust 2DPCA With Non-greedy $\ell _{1}$ -Norm Maximization for Image Analysis , 2015, IEEE Transactions on Cybernetics.

[24]  Daxiong Ji,et al.  An Enhanced Clustering-Based Method for Determining Time-of-Day Breakpoints Through Process Optimization , 2018, IEEE Access.

[25]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[26]  Qinghua Zheng,et al.  An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition , 2018, IEEE Transactions on Cybernetics.

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Cheng Shi,et al.  Novel Land Cover Change Detection Method Based on k-Means Clustering and Adaptive Majority Voting Using Bitemporal Remote Sensing Images , 2019, IEEE Access.

[29]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[30]  Feiping Nie,et al.  Learning a subspace for clustering via pattern shrinking , 2013, Inf. Process. Manag..

[31]  ChengXiang Zhai,et al.  Joint adaptive loss and l2/l0-norm minimization for unsupervised feature selection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[32]  Victor C. M. Leung,et al.  Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data , 2018, IEEE Access.

[33]  Feiping Nie,et al.  Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[35]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Adaptive Loss Minimization for Semi-Supervised Elastic Embedding , 2022 .

[36]  Feiping Nie,et al.  Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK) , 2014, ECML/PKDD.

[37]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[38]  Songcan Chen,et al.  Regularized soft K-means for discriminant analysis , 2013, Neurocomputing.

[39]  René Vidal,et al.  Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework , 2016, IEEE Transactions on Image Processing.

[40]  Feiping Nie,et al.  Semi-Supervised Classifications via Elastic and Robust Embedding , 2017, AAAI.