A filter feature selection method based LLRFC and redundancy analysis for tumor classification using gene expression data

Tumor gene expression data has the characteristic of high dimensionality and small sample size, which pose a rigorous challenge for tumor classification. Since not all the genes are associated with tumor phenotypes, the irrelevant features seriously reduce the learning performance. It is necessary to select relevant features from the original data. In this paper, we propose a new filter feature selection method based on the graph embedding framework for manifold learning, which is named as LLRFC score. The relationship between sample classes and features is considered in this method. But the selected features via this method may contain some redundancy. Thus it is improved through eliminating redundancy among the features. The improved method is named LLRFC score+. Several other feature selection approaches are used to compare with our method on nine public tumor gene expression datasets, the experimental results demonstrate that our presented method is quite promising and valid for tumor classification.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[3]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[4]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[5]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[6]  Olga Kayo,et al.  Locally linear embedding algorithm: extensions and applications , 2006 .

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Shutao Li,et al.  Graph embedding based feature selection , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Stan Z. Li,et al.  Manifold Learning and Applications in Recognition , 2005 .

[13]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[16]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Bouaguel Waad,et al.  An improvement direction for filter selection techniques using information theory measures and quadratic optimization , 2012, ArXiv.

[21]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Bo Li,et al.  Locally linear representation Fisher criterion based tumor gene expressive data classification , 2014, Comput. Biol. Medicine.