Informative Feature Clustering and Selection for Gene Expression Data

Feature selection aims to remove irrelevant and redundant features from input data. For gene expression, selecting important genes from gene expression data is essential since the gene expression data often consists of a large number of genes. However, the commonly-used feature selection methods are usually biased toward the highest rank features, and the correlation of these selected features may be high. To overcome these problems, we propose an informative feature clustering and selection method to select informative and diverse genes from the gene expression data. The method consists of two steps. In the first step, a feature clustering (FC) method is designed to cluster total genes into several gene clusters. In FC, a set of feature weights are computed to respect the importance of each gene, and we sort the genes in different gene clusters based on the feature weights. In the second step, we propose a stratified feature selection (SFS) method to select genes from different gene clusters and combine them to form the final feature set. Experiments on several gene expression data demonstrate the superiority of the proposed method over six popular feature selection methods.

[1]  Feiping Nie,et al.  Semi-supervised Feature Selection via Rescaled Linear Regression , 2017, IJCAI.

[2]  Chang-Dong Wang,et al.  Unsupervised feature selection with multi-subspace randomization and collaboration , 2019, Knowl. Based Syst..

[3]  Samuel H. Huang Supervised feature selection: A tutorial , 2015, Artif. Intell. Res..

[4]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[5]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[6]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Feiping Nie,et al.  Feature Selection via Global Redundancy Minimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[10]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[11]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[12]  Yunming Ye,et al.  Feature Weighting Information-Theoretic Co-Clustering for Document Clustering , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[13]  Chang-Dong Wang,et al.  Ultra-Scalable Spectral Clustering and Ensemble Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[15]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[16]  Jiawei Han,et al.  Joint Feature Selection and Subspace Learning , 2011, IJCAI.

[17]  Abhimanyu Das,et al.  Selecting Diverse Features via Spectral Regularization , 2012, NIPS.

[18]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[20]  Xiaojun Chen,et al.  Subspace Weighting Co-Clustering of Gene Expression Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Jeremy Pickens,et al.  A Survey of Feature Selection Techniques for Music Information Retrieval , 2001 .

[22]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[23]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[25]  Hiroshi Motoda,et al.  Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.

[26]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[27]  Qingyao Wu,et al.  Supervised Feature Selection With a Stratified Feature Weighting Method , 2018, IEEE Access.

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[30]  Ming Zhong,et al.  TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering , 2018, Pattern Recognit..

[31]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[32]  David Zhang,et al.  Feature selection and analysis on correlated gas sensor data with recursive feature elimination , 2015 .

[33]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[34]  Bo Liu,et al.  Uncorrelated Group LASSO , 2016, AAAI.