Gene selection criterion for discriminant microarray data analysis based on extreme value distributions

An important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression model, this gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data, L(D0|M). Typically, the computational burden for obtaining L(D0|M) is immense, often exceeding the limits of available resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme value problem, which can be easily solved by numerical simulation. We choose three classification problems from two publicly available microarray datasets to illustrate that approach.

[1]  Kimberly F. Johnson,et al.  Methods of microarray data analysis : papers from CAMDA , 2002 .

[2]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.

[3]  E. Boerwinkle,et al.  Feature (gene) selection in gene expression-based tumor classification. , 2001, Molecular genetics and metabolism.

[4]  Wentian Li,et al.  Copyright © American Society for Investigative Pathology Gene Discovery in Bladder Cancer Progression using cDNA Microarrays , 2022 .

[5]  Jotun Hein,et al.  Statistical Methods in Bioinformatics: An Introduction , 2002 .

[6]  J. Siedow,et al.  Making sense of microarrays , 2001, Genome Biology.

[7]  Alan Stuart,et al.  Statistics of extremes , 1960 .

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  H. A. David,et al.  Order Statistics (2nd ed). , 1981 .

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[12]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[13]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[14]  J. Claverie Computational methods for the identification of differential and coordinated gene expression. , 1999, Human molecular genetics.

[15]  Gregory R. Grant,et al.  USING NON-PARAMETRIC METHODS IN THE CONTEXT OF MULTIPLE TESTING TO DETERMINE DIFFERENTIALLY EXPRESSED GENES , 2002 .

[16]  Paul H. C. Eilers,et al.  Classification of microarray data with penalized logistic regression , 2001, SPIE BiOS.

[17]  Andreas Rytz,et al.  The limit fold change model: A practical approach for selecting differentially expressed genes from microarray data , 2002, BMC Bioinformatics.

[18]  Fabian Model,et al.  Feature selection for DNA methylation based cancer classification , 2001, ISMB.

[19]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Gregory R. Grant,et al.  Statistical Methods in Bioinformatics , 2001 .

[22]  S. Drăghici Statistical intelligence: effective analysis of high-density microarray data. , 2002, Drug discovery today.

[23]  Hongzhe Li,et al.  Cluster-Rasch models for microarray gene expression data , 2001, Genome Biology.

[24]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[25]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[26]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[27]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[28]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[29]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[30]  Bruce S. Weir,et al.  Classical Statistical Approaches to Molecular Classification of Cancer from Gene Expression Profiling , 2002 .

[31]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[32]  M. Kendall Theoretical Statistics , 1956, Nature.

[33]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[34]  Peter J. Park,et al.  A Nonparametric Scoring Algorithm for Identifying Informative Genes from Microarray Data , 2000, Pacific Symposium on Biocomputing.

[35]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[36]  Narayanaswamy Balakrishnan,et al.  Order statistics and inference , 1991 .

[37]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[38]  E. J. Gumbel,et al.  Statistics of Extremes. , 1960 .

[39]  Wentian Li,et al.  Zipf's law in importance of genes for cancer classification using microarray data. , 2001, Journal of theoretical biology.

[40]  T. Darden,et al.  Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method , 2002 .