Repeatability of published microarray gene expression analyses

Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.

[1]  Jane Marks,et al.  We Have a Problem , 1992 .

[2]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[3]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[4]  G. Gibson,et al.  Microarray Analysis , 2020, Definitions.

[5]  C. Ball,et al.  Submission of Microarray Data to Public Repositories , 2004, PLoS biology.

[6]  Carlos Alberto Guimarães,et al.  Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication , 2008, Revista espanola de cardiologia.

[7]  Jean-Philippe Brunet,et al.  The melanocyte differentiation program predisposes to metastasis after neoplastic transformation , 2005, Nature Genetics.

[8]  Eric E. Schadt,et al.  Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits , 2005 .

[9]  Ezgi O. Booth,et al.  Epistasis analysis with global transcriptional phenotypes , 2005, Nature Genetics.

[10]  J. Mesirov,et al.  An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis , 2005, Nature Genetics.

[11]  Philip Lijnzaad,et al.  An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas , 2005, Nature Genetics.

[12]  E E Schadt,et al.  Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits , 2005, Nature Genetics.

[13]  S. Tapscott,et al.  Widespread and nonrandom distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification , 2005, Nature Genetics.

[14]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[15]  S. Henikoff,et al.  Genome-scale profiling of histone H3.3 replacement patterns , 2005, Nature Genetics.

[16]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[17]  M. Lynch,et al.  The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans , 2005, Nature Genetics.

[18]  D. Crawford,et al.  Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus , 2005, Nature Genetics.

[19]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[20]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[21]  Robert Gentleman,et al.  Reproducible Research: A Bioinformatics Case Study , 2005, Statistical applications in genetics and molecular biology.

[22]  Tyson A. Clark,et al.  Nova regulates brain-specific splicing to shape the synapse , 2005, Nature Genetics.

[23]  B. Frey,et al.  Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs , 2005, Nature Genetics.

[24]  N. Barkai,et al.  A genetic signature of interspecies variations in gene expression , 2006, Nature Genetics.

[25]  M. Fornerod,et al.  Characterization of the Drosophila melanogaster genome at the nuclear lamina , 2006, Nature Genetics.

[26]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[27]  M. Lovett,et al.  Abnormal skin, limb and craniofacial morphogenesis in mice deficient for interferon regulatory factor 6 (Irf6) , 2006, Nature Genetics.

[28]  R. Shields,et al.  MIAME, we have a problem. , 2006, Trends in genetics : TIG.

[29]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[30]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[31]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[32]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[33]  Minimum compliance for a microarray experiment? , 2006, Nature Genetics.

[34]  Ermenegyldo Munhoz Junior Requisitos uniformes para manuscritos submetidos a periódicos biomédicos: escrevendo e editando para publicações biomédicas , 2006 .

[35]  Helen Parkinson,et al.  ArrayExpress service for reviewers/editors of DNA microarray papers , 2006, Nature Biotechnology.

[36]  R. Nadon,et al.  Inferential literacy for experimental high-throughput biology. , 2006, Trends in genetics : TIG.

[37]  R. Greenspan,et al.  Molecular analysis of flies selected for aggressive behavior , 2006, Nature Genetics.

[38]  Ola Larsson,et al.  Lack of correct data format and comparability limits future integrative microarray research , 2006, Nature Biotechnology.

[39]  S. Salzberg,et al.  Physiogenomic resources for rat models of heart, lung and blood disorders , 2006, Nature Genetics.

[40]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.

[41]  J. Ioannidis Molecular evidence‐based medicine , 2007, European journal of clinical investigation.

[42]  Thomas A Trikalinos,et al.  Selective discussion and transparency in microarray research findings for cancer outcomes. , 2007, European journal of cancer.

[43]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[44]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[45]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[46]  J. PérezMartín,et al.  [International Committee of Medical Journal Editors]. , 2008, Revista alergia Mexico.