Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework

Abstract Regarding as an important computing paradigm, cloud computing is to address big and distributed databases and rather simple computation. In this paradigm, data mining is one of the most important and fundamental problems. A large amount of data is generated by sensors and other intelligent devices. Data mining for these big data is crucial in various applications. K-means clustering is a typical technique to group the similar data into the same clustering, and has been commonly used in data mining. However, it is still a challenge to the data containing a large amount of noise, outliers and redundant features. In this paper, we propose a robust K-means clustering algorithm, namely, flexible subspace clustering. The proposed method incorporates feature selection and K-means clustering into a unified framework, which can select the refined features and improve the clustering performance. Moreover, for the purpose of enhancing the robustness, the l 2 . p -norm is embedded into the objective function. We can flexibly choose appropriate p according to the different data and thus obtain more robust performance. Experimental results verify the presented method has more robust and better performance on benchmark databases compared to the existing approaches.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Zhen Lei,et al.  Domain Adaptive Person Re-Identification via Camera Style Generation and Label Propagation , 2019, IEEE Transactions on Information Forensics and Security.

[3]  Feiping Nie,et al.  Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK) , 2014, ECML/PKDD.

[4]  Fei Yan,et al.  Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data , 2019, IEEE Access.

[5]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[6]  Feiping Nie,et al.  $\ell _{2,p}$ -Norm Based PCA for Image Recognition , 2018, IEEE Transactions on Image Processing.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Sherif Sakr,et al.  Towards a Comprehensive Data Analytics Framework for Smart Healthcare Services , 2016, Big Data Res..

[9]  Feiping Nie,et al.  $\ell _{2,p}$ -Norm Based PCA for Image Recognition. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[10]  Hao Wu,et al.  Dynamic Gesture Recognition in the Internet of Things , 2019, IEEE Access.

[11]  David G. Stork,et al.  Pattern Classification , 1973 .

[12]  Renquan Lu,et al.  A Wide-Deep-Sequence Model-Based Quality Prediction Method in Industrial Process Analysis , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[14]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Laurence T. Yang,et al.  ADTT: A Highly Efficient Distributed Tensor-Train Decomposition Method for IIoT Big Data , 2021, IEEE Transactions on Industrial Informatics.

[16]  Hao Zhe,et al.  The Research on Resource Scheduling Based on Fuzzy Clustering in Cloud Computing , 2015, 2015 8th International Conference on Intelligent Computation Technology and Automation (ICICTA).

[17]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[18]  Feiping Nie,et al.  Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Xiaodong Wang,et al.  Adaptive multi-view subspace clustering for high-dimensional data , 2020, Pattern Recognit. Lett..

[20]  Bernard De Baets,et al.  Kernel-Based Distance Metric Learning for Supervised $k$ -Means Clustering , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Karuna Pande Joshi,et al.  A Semantic Approach to Cloud Security and Compliance , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[22]  Feiping Nie,et al.  Orthogonal vs. uncorrelated least squares discriminant analysis for feature extraction , 2012, Pattern Recognit. Lett..

[23]  Hai Jin,et al.  Using Crowdsourcing to Provide QoS for Mobile Cloud Computing , 2019, IEEE Transactions on Cloud Computing.

[24]  Lei Shi,et al.  Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[25]  Rung Ching Chen,et al.  Unsupervised feature analysis with sparse adaptive learning , 2018, Pattern Recognit. Lett..

[26]  Geyong Min,et al.  Dynamic Resource Discovery Based on Preference and Movement Pattern Similarity for Large-Scale Social Internet of Things , 2016, IEEE Internet of Things Journal.

[27]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[28]  Xiao Zhang,et al.  PerfInsight: A Robust Clustering-Based Abnormal Behavior Detection System for Large-Scale Cloud , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[29]  Songcan Chen,et al.  Regularized soft K-means for discriminant analysis , 2013, Neurocomputing.

[30]  Shawn N. Murphy,et al.  kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning , 2018, Big Data Res..

[31]  Feiping Nie,et al.  Effective Discriminative Feature Selection With Nontrivial Solution , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[33]  Feiping Nie,et al.  Compound Rank- $k$ Projections for Bilinear Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Laurence T. Yang,et al.  High-order possibilistic c-means algorithms based on tensor decompositions for big data in IoT , 2018, Inf. Fusion.

[35]  Zied Chtourou,et al.  A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach , 2018, Comput. Electr. Eng..

[36]  I. Jolliffe Principal Component Analysis , 2002 .

[37]  Guoxia Xu,et al.  Dual Calibration Mechanism Based L2, p-Norm for Graph Matching , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Yu-Feng Yu,et al.  Sparse approximation to discriminant projection learning and application to image classification , 2019, Pattern Recognit..

[39]  Tao Jiang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[40]  Feiping Nie,et al.  Learning a subspace for clustering via pattern shrinking , 2013, Inf. Process. Manag..

[41]  Lei Ren,et al.  A Data-Driven Approach of Product Quality Prediction for Complex Production Systems , 2021, IEEE Transactions on Industrial Informatics.

[42]  Laurence Tianruo Yang,et al.  A Tensor-Based Multiattributes Visual Feature Recognition Method for Industrial Intelligence , 2021, IEEE Transactions on Industrial Informatics.