Comparison of feature selection methods for multiclass cancer classification based on microarray data

Multiclass cancer classification remains a challenging task in the field of machine learning. We presented a comparative study of seven feature selection methods and evaluated their performance by six different types of classification methods. We applied it to the four multiclass cancer datasets. We demonstrated that feature selection is critical for multiclass cancer classification performance. We also demonstrated that an appropriate combination of feature selection techniques and classification methods makes it possible to achieve excellent performance on multiclass cancer classification task. Support vector machine method based on recursive feature elimination (SVM-RFE) feature selection algorithm combined with sequential minimal optimization algorithm for training support vector machines (SMO) classification method showed the best performance.

[1]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[2]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[3]  Mugdha Gadgil,et al.  Comparison of feature selection and classification combinations for cancer classification using microarray data , 2009, Int. J. Bioinform. Res. Appl..

[4]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[7]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[8]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[9]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[10]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[11]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[14]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[15]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[16]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[17]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[19]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[20]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.