One-step spectral rotation clustering for imbalanced high-dimensional data

Abstract The class distribution of imbalanced data sets is skewed in practical application. As traditional clustering methods mainly are designed for improving the overall learning performance, the majority class usually tends to be clustered and the minority class which is more valuable maybe ignored. Moreover, existing clustering methods can be limited for the performance of imbalanced and high-dimensional domains. In this paper, we present one-step spectral rotation clustering for imbalanced high-dimensional data (OSRCIH) by integrating self-paced learning and spectral rotation clustering in a unified learning framework, where sample selection and dimensionality reduction are simultaneously considered with mutual and iterative update. Specifically, the imbalance problem is considered by selecting the same number of training samples from each intrinsic group of the training data, where the sample-weight vector is obtained by self-paced learning. Moreover, dimensionality reduction is conducted by combining subspace learning and feature selection. Experimental analysis on synthetic datasets and real datasets showed that OSRCIH could recognize and enhance the weight of important samples and features so as to avoid the clustering method in favor of the majority class and to improve effectively the clustering performance.

[1]  Junji Nakano,et al.  A Procedural and Object-Oriented Statistical Scripting Language , 2002, Comput. Stat..

[2]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Yu Sun,et al.  Video Coding Optimization for Virtual Reality 360-Degree Source , 2020, IEEE Journal of Selected Topics in Signal Processing.

[4]  Arnaud Gotlieb,et al.  Discovering Program Topoi via Hierarchical Agglomerative Clustering , 2018, IEEE Transactions on Reliability.

[5]  Xiaofeng Zhu,et al.  Robust SVM with adaptive graph learning , 2019, World Wide Web.

[6]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[7]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[9]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[10]  Ling Shao,et al.  Marginal Representation Learning With Graph Structure Self-Adaptation. , 2018, IEEE transactions on neural networks and learning systems.

[11]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[15]  María José del Jesús,et al.  Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets , 2009, Int. J. Approx. Reason..

[16]  Shichao Zhang,et al.  Spectral clustering via half-quadratic optimization , 2019, World Wide Web.

[17]  Xiaokang Wang,et al.  A density weighted fuzzy outlier clustering approach for class imbalanced learning , 2020, Neural Computing and Applications.

[18]  Zi Huang,et al.  Exploiting Subspace Relation in Semantic Labels for Cross-Modal Hashing , 2021, IEEE Transactions on Knowledge and Data Engineering.

[19]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Feiping Nie,et al.  Spectral Rotation versus K-Means in Spectral Clustering , 2013, AAAI.

[21]  Zenglin Xu,et al.  Robust Graph Learning From Noisy Data , 2018, IEEE Transactions on Cybernetics.

[22]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[23]  Xuelong Li,et al.  A generalized power iteration method for solving quadratic problem on the Stiefel manifold , 2017, Science China Information Sciences.

[24]  Haizhou Li,et al.  A Cost-Sensitive Deep Belief Network for Imbalanced Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Feiping Nie,et al.  Scalable Normalized Cut with Improved Spectral Rotation , 2017, IJCAI.

[27]  Victor S. Sheng,et al.  Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering , 2020, Expert Syst. Appl..

[28]  Zenglin Xu,et al.  Partition level multiview subspace clustering , 2019, Neural Networks.

[29]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[30]  Qinghua Hu,et al.  Subspace clustering guided unsupervised feature selection , 2017, Pattern Recognit..

[31]  James Bailey,et al.  Comments on supervised feature selection by clustering using conditional mutual information-based distances , 2013, Pattern Recognit..

[32]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[33]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[34]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[35]  Venkatesh Saligrama,et al.  Spectral clustering with imbalanced data , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Xiaofeng Zhu,et al.  Local and Global Structure Preservation for Robust Unsupervised Spectral Feature Selection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[37]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[38]  Krishnan Murugan,et al.  Grid-based Clustering with Predefined Path Mobility for Mobile Sink Data Collection to Extend Network Lifetime in Wireless Sensor Networks , 2012 .

[39]  Kazuyuki Murase,et al.  A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning , 2011, ICONIP.

[40]  Jiye Liang,et al.  Determining the number of clusters using information entropy for mixed data , 2012, Pattern Recognit..

[41]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[42]  Zenglin Xu,et al.  Structure Learning with Similarity Preserving , 2019, Neural Networks.

[43]  Ehsan Adeli,et al.  Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Xiaofeng Zhu,et al.  Efficient Utilization of Missing Data in Cost-Sensitive Learning , 2019, IEEE Transactions on Knowledge and Data Engineering.

[45]  Xuelong Li,et al.  Self-weighted Multiview Clustering with Multiple Graphs , 2017, IJCAI.

[46]  Zhi-Hua Zhou,et al.  Optimal Margin Distribution Clustering , 2018, AAAI.

[47]  Andy M. Yip,et al.  A regularized k-means and multiphase scale segmentation , 2011 .

[48]  Zheng Zhang,et al.  Generalized Incomplete Multiview Clustering With Flexible Locality Structure Diffusion , 2020, IEEE Transactions on Cybernetics.

[49]  Ling Shao,et al.  Binary Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[51]  Xuelong Li,et al.  Graph PCA Hashing for Similarity Search , 2017, IEEE Transactions on Multimedia.

[52]  Peter Filzmoser,et al.  Clustering of imbalanced high-dimensional media data , 2017, Advances in Data Analysis and Classification.

[53]  Deyu Meng,et al.  A theoretical understanding of self-paced learning , 2017, Inf. Sci..

[54]  Shichao Zhang,et al.  Low-Rank Sparse Subspace for Spectral Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[55]  Wei Zheng,et al.  Spectral rotation for deep one-step clustering , 2020, Pattern Recognit..