Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile

The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.

[1]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[2]  Zhang Hui,et al.  Wrapper Feature Extraction for Time Series Classification Using Singular Value Decomposition , 2005 .

[3]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[4]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[5]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[6]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[7]  G. Getz,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2005, Breast Cancer Research.

[8]  Gregory Piatetsky-Shapiro,et al.  Microarray data mining: facing the challenges , 2003, SKDD.

[9]  Robert Clarke,et al.  Module-based biomarker discovery in breast cancer , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  Jian Pei,et al.  Introduction to the special issue on data mining for health informatics , 2007, SKDD.

[11]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[12]  Vadlamani Ravi,et al.  Colon cancer prediction with genetic profiles using intelligent techniques , 2008, Bioinformation.

[13]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Wei Pan,et al.  Network-based support vector machine for classification of microarray samples , 2009, BMC Bioinformatics.

[15]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[16]  Xiaohui S. Xie,et al.  Disease gene discovery through integrative genomics. , 2005, Annual review of genomics and human genetics.

[17]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[18]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[19]  Simon Rogers,et al.  Class Prediction with Microarray Datasets , 2004 .

[20]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[21]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Benny Y. M. Fung,et al.  Classification of heterogeneous gene expression data , 2003, SKDD.

[24]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[25]  Youping Deng,et al.  SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data , 2006, BMC Bioinformatics.

[26]  Dimitrios I. Fotiadis,et al.  A classification-based segmentation of cDNA microarray images using Support Vector machines , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[28]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[30]  Nir Friedman,et al.  Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays , 2004, Bioinform..

[31]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[33]  Md. Nasir Sulaiman,et al.  Integrating Biological Information for Feature Selection in Microarray Data Classification , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Hua Wang,et al.  Combined Gene Selection Methods for Microarray Data Analysis , 2006, KES.

[36]  Qiang Shen,et al.  Aiding classification of gene expression data with feature selection: a comparative study , 2005 .

[37]  Olga G. Troyanskaya,et al.  Putting microarrays in a context: Integrated analysis of diverse biological data , 2005, Briefings Bioinform..

[38]  Miguel Rocha,et al.  A platform for the selection of genes in DNA microarraydata using evolutionary algorithms , 2007, GECCO '07.

[39]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.