Iterative Feature perturbation as a gene Selector for microarray Data

Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.

[1]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[2]  Li Chen,et al.  Noise-Based Feature Perturbation as a Selection Method for Microarray Data , 2007, ISBRA.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Lawrence O. Hall,et al.  Feature selection for microarray data by AUC analysis , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[6]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[8]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[9]  I. Yang,et al.  Molecular staging for survival prediction of colorectal cancer patients. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[11]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[12]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[13]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Qingzhong Liu,et al.  Feature Selection and Ranking of Key Genes for Tumor Classification: Using Microarray Gene Expression Data , 2006, ICAISC.

[16]  W. Gerald,et al.  Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy , 2005, Cancer.

[17]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[18]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[19]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[20]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[23]  Lawrence O. Hall,et al.  Filtering for improved gene selection on microarray data , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[26]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[29]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[30]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  P. A. Futreal,et al.  Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. , 2012, The New England journal of medicine.

[32]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[33]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[34]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.