Gene selection for microarray data classification via subspace learning and manifold regularization

AbstractWith the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical AbstractThe graphical abstract of this work

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[4]  Qinghua Hu,et al.  Non-convex regularized self-representation for unsupervised feature selection , 2015, Image Vis. Comput..

[5]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[6]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Xiao Chen,et al.  A multi-objective heuristic algorithm for gene expression microarray data classification , 2016, Expert Syst. Appl..

[8]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[9]  S. Z. Zhang,et al.  Efficient Sugarcane Transformation via bar Gene Selection , 2017, Tropical Plant Biology.

[10]  Yangyang Li,et al.  Self-representation based dual-graph regularized feature selection clustering , 2016, Neurocomputing.

[11]  Guodong Zhao,et al.  Feature Subset Selection for Cancer Classification Using Weight Local Modularity , 2016, Scientific Reports.

[12]  Kim-Anh Lê Cao,et al.  Multiclass classification and gene selection with a stochastic algorithm , 2009, Comput. Stat. Data Anal..

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Mehrnoosh Bazrafkan,et al.  A novel sparse coding algorithm for classification of tumors based on gene expression data , 2016, Medical & Biological Engineering & Computing.

[15]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[16]  Qingshan Jiang,et al.  A centroid-based gene selection method for microarray data classification. , 2016, Journal of theoretical biology.

[17]  Xiao Zheng,et al.  Speckle noise reduction for optical coherence tomography images via non-local weighted group low-rank representation , 2017 .

[18]  Lukasz A. Kurgan,et al.  Knowledge discovery approach to automated cardiac SPECT diagnosis , 2001, Artif. Intell. Medicine.

[19]  Xiao Wang,et al.  Unsupervised feature selection via Diversity-induced Self-representation , 2017, Neurocomputing.

[20]  M. R. Baring,et al.  Advanced Backcross Quantitative Trait Loci (QTL) Analysis of Oil Concentration and Oil Quality Traits in Peanut (Arachis hypogaea L.) , 2017, Tropical Plant Biology.

[21]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Wei-Chung Cheng,et al.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm , 2014, BMC Bioinformatics.

[23]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[24]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[25]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[26]  Wei-Chung Cheng,et al.  Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database , 2010, BMC Bioinformatics.

[27]  K. Ma,et al.  Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study , 2017, Scientific Reports.

[28]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Dongmei Zhang,et al.  Nonparametrically Guided Autoencoder with Laplace Approximation for dimensionality reduction , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[30]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[31]  Alfonso González-Briones,et al.  An Agent-Based Clustering Approach for Gene Selection in Gene Expression Microarray , 2017, Interdisciplinary Sciences: Computational Life Sciences.

[32]  K. Morteza,et al.  A novel sparse coding algorithm for classification of tumors based on gene expression data , 2017 .

[33]  Qingshan Jiang,et al.  A L1-regularized feature selection method for local dimension reduction on microarray data , 2017, Comput. Biol. Chem..

[34]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[35]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[36]  Xiangtao Li,et al.  Multiobjective ranking binary artificial bee colony for gene selection problems using microarray datasets , 2017 .

[37]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[38]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[39]  Ivan P. Gavrilyuk,et al.  Lagrange multiplier approach to variational problems and applications , 2010, Math. Comput..

[40]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[41]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[42]  Chun-Hou Zheng,et al.  Differentially expressed genes selection via Laplacian regularized low-rank representation method , 2016, Comput. Biol. Chem..

[43]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[44]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  M. Gahr,et al.  Distribution of estrogen receptors in the brain of the Japanese quail: an immunocytochemical study , 1989, Brain Research.

[46]  Jiguo Yu,et al.  An NMF-L2,1-Norm Constraint Method for Characteristic Gene Selection , 2016, PloS one.

[47]  A D Long,et al.  Improved Statistical Inference from DNA Microarray Data Using Analysis of Variance and A Bayesian Statistical Framework , 2001, The Journal of Biological Chemistry.

[48]  Xiaowei Yang,et al.  An efficient gene selection algorithm based on mutual information , 2009, Neurocomputing.

[49]  Jinmao Wei,et al.  Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[51]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[52]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[53]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[54]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Li-Yeh Chuang,et al.  A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data , 2012, J. Comput. Biol..

[56]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[57]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Yide Ma,et al.  Robust unsupervised feature selection via matrix factorization , 2017, Neurocomputing.

[59]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[61]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[62]  Mohammad Hossein Moattar,et al.  A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. , 2016, Genomics.

[63]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[64]  Shiquan Sun,et al.  A Kernel-Based Multivariate Feature Selection Method for Microarray Data Classification , 2014, PloS one.

[65]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[66]  Pichao Wang,et al.  Salient Object Detection via Weighted Low Rank Matrix Recovery , 2017, IEEE Signal Processing Letters.

[67]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[68]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[69]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[70]  M. Hestenes Multiplier and gradient methods , 1969 .

[71]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[72]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[73]  Kazufumi Ito,et al.  Lagrange multiplier approach to variational problems and applications , 2008, Advances in design and control.

[74]  Ben Niu,et al.  A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data , 2017, Knowl. Based Syst..

[75]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[76]  Junbin Gao,et al.  Gaussian Processes Autoencoder for Dimensionality Reduction , 2014, PAKDD.

[77]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.