Key aspects of analyzing microarray gene-expression data.

One major challenge with the use of microarray technology is the analysis of massive amounts of gene-expression data for various applications. This review addresses the key aspects of the microarray gene-expression data analysis for the two most common objectives: class comparison and class prediction. Class comparison mainly aims to select which genes are differentially expressed across experimental conditions. Gene selection is separated into two steps: gene ranking and assigning a significance level. Class prediction uses expression profiling analysis to develop a prediction model for patient selection, diagnostic prediction or prognostic classification. Development of a prediction model involves two components: model building and performance assessment. It also describes two additional data analysis methods: gene-class testing and multiple ordering criteria.

[1]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  R. Simon,et al.  Development and evaluation of therapeutically relevant predictive classifiers using gene expression profiling. , 2006, Journal of the National Cancer Institute.

[4]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[5]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[6]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[7]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[8]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[9]  Huey-Miin Hsueh,et al.  A Generalized Additive Model For Microarray Gene Expression Data Analysis , 2004, Journal of biopharmaceutical statistics.

[10]  L. Qin,et al.  Empirical evaluation of data transformations and ranking statistics for microarray analysis. , 2004, Nucleic acids research.

[11]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[12]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[13]  Hongshik Ahn,et al.  Classification methods for the development of genomic signatures from high-dimensional data , 2006, Genome Biology.

[14]  Chen-An Tsai,et al.  Testing for differentially expressed genes with microarray data. , 2003, Nucleic acids research.

[15]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[16]  Huey-miin Hsueh,et al.  Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing , 2003, Journal of biopharmaceutical statistics.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  John D. Storey A direct approach to false discovery rates , 2002 .

[19]  Chen-An Tsai,et al.  Gene selection for sample classifications in microarray experiments. , 2004, DNA and cell biology.

[20]  James J. Chen,et al.  Analysis of variance components in gene expression data , 2004, Bioinform..

[21]  Richard Simon,et al.  Roadmap for developing and validating therapeutically relevant genomic classifiers. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[23]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[24]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[25]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[26]  Alfonso Valencia,et al.  Increasing the Impact of Bioinformatics , 2005, Bioinform..

[27]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[28]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[29]  R. Simon Validation of pharmacogenomic biomarker classifiers for treatment selection. , 2006, Cancer biomarkers : section A of Disease markers.

[30]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[31]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Chun-Houh Chen,et al.  Gene selection with multiple ordering criteria , 2007, BMC Bioinformatics.

[34]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[35]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[36]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[37]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[38]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.

[39]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Xing Qiu,et al.  Statistical methods and microarray data , 2007, Nature Biotechnology.

[41]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[42]  James J. Chen,et al.  Multiple‐Testing Strategy for Analyzing cDNA Array Data on Gene Expression , 2004, Biometrics.

[43]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[44]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J J Chen,et al.  Selection of differentially expressed genes in microarray data analysis , 2007, The Pharmacogenomics Journal.

[46]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[47]  Chen-An Tsai,et al.  Multi-class clustering and prediction in the analysis of microarray data. , 2005, Mathematical biosciences.

[48]  Rudolph Parrish,et al.  Normalization of Microarray Data , 2005 .

[49]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[50]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[51]  A. Galecki,et al.  Interpretation, design, and analysis of gene array expression experiments. , 2001, The journals of gerontology. Series A, Biological sciences and medical sciences.

[52]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.