The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity.

[1]  Jae Won Lee,et al.  Comparison of various statistical methods for identifying differential gene expression in replicated microarray data , 2006, Statistical methods in medical research.

[2]  Philip M. Long,et al.  Comment on " 'Stemness': Transcriptional Profiling of Embryonic and Adult Stem Cells" and "A Stem Cell Molecular Signature" (I) , 2003, Science.

[3]  Weida Tong,et al.  Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data , 2005, Nucleic acids research.

[4]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[5]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[6]  L. Kunkel,et al.  Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kathleen F. Kerr,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005, Nature Methods.

[8]  Rafael A Irizarry,et al.  Gene expression in giant cell myocarditis: Altered expression of immune response genes. , 2005, International journal of cardiology.

[9]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[10]  Kristina Hanspers,et al.  Spotted long oligonucleotide arrays for human gene expression analysis. , 2003, Genome research.

[11]  R. Ulrich,et al.  Overview of an interlaboratory collaboration on evaluating the effects of model hepatotoxicants on hepatic gene expression. , 2004, Environmental health perspectives.

[12]  Z. Szallasi,et al.  Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. , 2004, Nucleic acids research.

[13]  C. Kahn,et al.  Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Shippy,et al.  Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations , 2004, BMC Genomics.

[15]  Daniel J. Park,et al.  A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[16]  Simon Frantz,et al.  An array of problems , 2005, Nature reviews. Drug discovery.

[17]  Krishnarao Appasani,et al.  Experimental Design for Gene Expression Analysis , 2007, Bioarrays.

[18]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[19]  Isaac S. Kohane,et al.  Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements , 2005, BMC Bioinformatics.

[20]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[21]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[22]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[23]  Jianbo Li,et al.  The gene expression fingerprint of human heart failure , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Tao Han,et al.  Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential , 2005, BMC Bioinformatics.

[25]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[26]  Guidel Ines,et al.  Expression profiling — best practices for data generation and interpretation in clinical trials , 2004, Nature Reviews Genetics.

[27]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[28]  Elliot M. Meyerowitz,et al.  Genome-Wide Analysis of Spatial Gene Expression in Arabidopsis Flowers , 2004, The Plant Cell Online.

[29]  L. Qin,et al.  Empirical evaluation of data transformations and ranking statistics for microarray analysis. , 2004, Nucleic acids research.

[30]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[31]  M Ramalho Santos STEMNESS: TRANSCRIPTIONAL PROFILING OF EMBRYONIC AND ADULT STEM CELLS , 2002 .

[32]  Lu Zhang,et al.  Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays , 2006, BMC Genomics.

[33]  John T. Dimos,et al.  A Stem Cell Molecular Signature , 2002, Science.

[34]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[35]  Ryszard Maleszka,et al.  Microarray reality checks in the context of a complex disease , 2004, Nature Biotechnology.

[36]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[37]  G. Getz,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2005, Breast Cancer Research.

[38]  Howard J Federoff,et al.  Dysregulation of Gene Expression in the 1-Methyl-4-Phenyl-1,2,3,6-Tetrahydropyridine-Lesioned Mouse Substantia Nigra , 2004, The Journal of Neuroscience.

[39]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[40]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[42]  Andreas Rytz,et al.  Microarray data analysis: a practical approach for selecting differentially expressed genes , 2001, Genome Biology.

[43]  Stephen Huxley,et al.  Lowest common denominator , 1999 .

[44]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[45]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.