Significance of Gene Ranking for Classification of Microarray Samples

Many methods for classification and gene selection with microarray data have been developed. These methods usually give a ranking of genes. Evaluating the statistical significance of the gene ranking is important for understanding the results and for further biological investigations, but this question has not been well addressed for machine learning methods in existing works. Here, we address this problem by formulating it in the framework of hypothesis testing and propose a solution based on resampling. The proposed r-test methods convert gene ranking results into position p-values to evaluate the significance of genes. The methods are tested on three real microarray data sets and three simulation data sets with support vector machines as the method of classification and gene selection. The obtained position p-values help to determine the number of genes to be selected and enable scientists to analyze selection results by sophisticated multivariate methods under the same statistical inference paradigm as for simple hypothesis testing methods

[1]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[2]  G. Clark,et al.  Reference , 2008 .

[3]  Xuegong Zhang,et al.  ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach , 2000 .

[4]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[5]  H. Yu,et al.  Discovering compact and highly discriminative features or combinations of drug activities using support vector machines , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[6]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[7]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8]  Xuegong Zhang,et al.  Recursive Sample Classification and Gene Selection based on SVM: Method and Software Description # , 2001 .

[9]  Sridhar Ramaswamy,et al.  Loss of Heterozygosity and Its Correlation with Expression Profiles in Subclasses of Invasive Breast Cancers , 2004, Cancer Research.

[10]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[12]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[13]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Chen-An Tsai,et al.  Testing for differentially expressed genes with microarray data. , 2003, Nucleic acids research.

[16]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[18]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.