Supervised Feature Selection With a Stratified Feature Weighting Method

Feature selection has been a powerful tool to handle high-dimensional data. Most of these methods are biased toward the highest rank features which may be highly correlated with each other. In this paper, we address this problem proposing stratified feature ranking (SFR) method for supervised feature ranking of high-dimensional data. Given a dataset with class labels, we first propose a subspace feature clustering (SFC) to simultaneously identify feature clusters and the importance of each feature for each class. In the SFR method, the features in different feature clusters are separately ranked according to the subspace weight produced by SFC. After that, we propose a stratified feature weighting method for ranking the features such that the high rank features are both informative and diverse. We have conducted a series of experiments to verify the effectiveness and scalability of SFC for feature clustering. The proposed SFR method was compared with six feature selection methods on a set of high-dimensional datasets and the results show that SFR was superior to most of these feature selection methods.

[1]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[2]  Samuel H. Huang Supervised feature selection: A tutorial , 2015, Artif. Intell. Res..

[3]  Ruggero G. Pensa,et al.  Constrained Co-clustering of Gene Expression Data , 2008, SDM.

[4]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[5]  Hasan Davulcu,et al.  Story Forms Detection in Text through Concept-Based Co-Clustering , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Feiping Nie,et al.  Semi-supervised Feature Selection via Rescaled Linear Regression , 2017, IJCAI.

[8]  David Zhang,et al.  Feature selection and analysis on correlated gas sensor data with recursive feature elimination , 2015 .

[9]  Mustapha Lebbah,et al.  Feature Group Weighting and Topological Biclustering , 2014, ICONIP.

[10]  Bo Liu,et al.  Uncorrelated Group LASSO , 2016, AAAI.

[11]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[12]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[13]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[15]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[16]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[17]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[19]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[20]  Xiaojun Chen,et al.  Subspace Weighting Co-Clustering of Gene Expression Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[22]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  Feiping Nie,et al.  Feature Selection via Global Redundancy Minimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  H. Hotelling The Selection of Variates for Use in Prediction with Some Comments on the General Problem of Nuisance Parameters , 1940 .

[25]  Ming Zhong,et al.  TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering , 2018, Pattern Recognit..

[26]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[27]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  Abhimanyu Das,et al.  Selecting Diverse Features via Spectral Regularization , 2012, NIPS.

[30]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[31]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[33]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[34]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[35]  Hiroshi Motoda,et al.  Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.