Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Various approaches to gene selection for cancer classification based on microarray data can be found in the literature and they may be grouped into two categories: univariate methods and multivariate methods. Univariate methods look at each gene in the data in isolation from others. They measure the contribution of a particular gene to the classification without considering the presence of the other genes. In contrast, multivariate methods measure the relative contribution of a gene to the classification by taking the other genes in the data into consideration. Multivariate methods select fewer genes in general. However, the selection process of multivariate methods may be sensitive to the presence of irrelevant genes, noises in the expression and outliers in the training data. At the same time, the computational cost of multivariate methods is high. To overcome the disadvantages of the two types of approaches, we propose a hybrid method to obtain gene sets that are small and highly discriminative. We devise our hybrid method from the univariate Maximum Likelihood method (LIK) and the multivariate Recursive Feature Elimination method (RFE). We analyze the properties of these methods and systematically test the effectiveness of our proposed method on two cancer microarray datasets. Our experiments on a leukemia dataset and a small, round blue cell tumors dataset demonstrate the effectiveness of our hybrid method. It is able to discover sets consisting of fewer genes than those reported in the literature and at the same time achieve the same or better prediction accuracy.

[1]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[2]  J. Mesirov,et al.  Golub , Monitoring and Class Prediction by Gene Expression Molecular Classification of Cancer : Class Discovery , 2007 .

[3]  William Stafford Noble,et al.  Support vector machine , 2013 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Paula A. Kiberstis,et al.  It's Not Just the Genes , 2002, Science.

[6]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[7]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[10]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[11]  Jill P. Mesirov,et al.  Class prediction and discovery using gene expression data , 2000, RECOMB '00.

[12]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[13]  Walter L. Ruzzo,et al.  Bayesian Classification of DNA Array Expression Data , 2000 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[16]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[20]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[21]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[22]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[24]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[25]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[28]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .