Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification.

Cancer subtype recognition and feature selection are important problems in the diagnosis and treatment of tumors. Here, we propose a novel gene selection approach applied to gene expression data classification. First, two classical feature reduction methods including locally linear embedding (LLE) and rough set (RS) are summarized. The advantages and disadvantages of these algorithms were analyzed and an optimized model for tumor gene selection was developed based on LLE and neighborhood RS (NRS). Bhattacharyya distance was introduced to delete irrelevant genes, pair-wise redundant analysis was performed to remove strongly correlated genes, and the wavelet soft threshold was determined to eliminate noise in the gene datasets. Next, prior optimized search processing was carried out. A new approach combining dimension reduction of LLE and feature reduction of NRS (LLE-NRS) was developed for selecting gene subsets, and then an open source software Weka was applied to distinguish different tumor types and verify the cross-validation classification accuracy of our proposed method. The experimental results demonstrated that the classification performance of the proposed LLE-NRS for selecting gene subset outperforms those of other related models in terms of accuracy, and our proposed approach is feasible and effective in the field of high-dimensional tumor classification.

[1]  Chongzhao Han,et al.  Feature Selection Based on Bhattacharyya Distance: A Generalized Rough Set Method , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[2]  Yong Xu,et al.  RPCA-Based Tumor Classification Using Gene Expression Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Yiyu Yao,et al.  Generalization of Rough Sets using Modal Logics , 1996, Intell. Autom. Soft Comput..

[4]  Saeid Nahavandi,et al.  A novel aggregate gene selection method for microarray data classification , 2015, Pattern Recognit. Lett..

[5]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[6]  Lei Su,et al.  A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[7]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[8]  Shu-Lin Wang,et al.  Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification , 2010, Journal of biomedicine & biotechnology.

[9]  Yong Liu,et al.  Quick attribute reduct algorithm for neighborhood rough set model , 2014, Inf. Sci..

[10]  Jun Meng,et al.  基于近邻传播聚类的集成特征选择方法 (Affinity Propagation Clustering Based Ensemble Feature Selection Method) , 2015, 计算机科学.

[11]  Yingmin Jia,et al.  Adaptive huberized support vector machine and its application to microarray classification , 2011, Neural Computing and Applications.

[12]  Hong Yan,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Parham Moradi,et al.  Gene selection for microarray data classification using a novel ant colony optimization , 2015, Neurocomputing.

[14]  Jing Zhang,et al.  Gene selection using rough set based on neighborhood for the analysis of plant stress response , 2014, Appl. Soft Comput..

[15]  Tommy W. S. Chow,et al.  Effective Gene Selection Method Using Bayesian Discriminant Based Criterion and Genetic Algorithms , 2008, J. Signal Process. Syst..

[16]  Xiaowei Yang,et al.  An efficient gene selection algorithm based on mutual information , 2009, Neurocomputing.

[17]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[18]  Eduardo Mendez,et al.  A Log Likelihood Predictor for Genomic Classification of Oral Cancer using Principle Component Analysis for Feature Selection , 2004, MedInfo.

[19]  Wang Lei Analysis of Modified Methods of Wavelet Threshold De-noising Functions , 2007 .

[20]  Lin Sun,et al.  Principal component-based feature selection for tumor classification. , 2015, Bio-medical materials and engineering.

[21]  C. Viroli,et al.  Supervised locally linear embedding for classification : an application to gene expression data analysis Supervised locally linear embedding in problemi di classificazione : un ’ applicazione all ’ analisi di dati di espressione genica , 2005 .

[22]  M. Hasan Shaheed,et al.  Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization , 2015, Appl. Soft Comput..

[23]  Kuldip K. Paliwal,et al.  A feature selection method using improved regularized linear discriminant analysis , 2014, Machine Vision and Applications.

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Lin Sun,et al.  A granular computing approach to gene selection. , 2014, Bio-medical materials and engineering.

[26]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[27]  Madhubanti Maitra,et al.  Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique , 2015, Expert Syst. Appl..

[28]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[29]  Wei Luo,et al.  Feature Selection for Cancer Classification Based on Support Vector Machine , 2009, 2009 WRI Global Congress on Intelligent Systems.

[30]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[31]  Daniel W. Apley,et al.  Feature selection for noisy variation patterns using kernel principal component analysis , 2014, Knowl. Based Syst..