Assessing the validity and reproducibility of genome-scale predictions

MOTIVATION Validation and reproducibility of results is a central and pressing issue in genomics. Several recent embarrassing incidents involving the irreproducibility of high-profile studies have illustrated the importance of this issue and the need for rigorous methods for the assessment of reproducibility. RESULTS Here, we describe an existing statistical model that is very well suited to this problem. We explain its utility for assessing the reproducibility of validation experiments, and apply it to a genome-scale study of adenosine deaminase acting on RNA (ADAR)-mediated RNA editing in Drosophila. We also introduce a statistical method for planning validation experiments that will obtain the tightest reproducibility confidence limits, which, for a fixed total number of experiments, returns the optimal number of replicates for the study. AVAILABILITY Downloadable software and a web service for both the analysis of data from a reproducibility study and for the optimal design of these studies is provided at http://ccmbweb.ccv.brown.edu/reproducibility.html .

[1]  K. Nishikura,et al.  Substrate specificity of the dsRNA unwinding/modifying activity. , 1991, The EMBO journal.

[2]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[3]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[4]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[5]  B. Newell,et al.  Priming Intelligent Behavior: An Elusive Phenomenon , 2013, PloS one.

[6]  M. Garcia-Blanco,et al.  Factors Affecting Reproducibility between Genome-Scale siRNA-Based Screens , 2010, Journal of biomolecular screening.

[7]  Ricardo Z. N. Vêncio,et al.  Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE) , 2004, BMC Bioinformatics.

[8]  Helmut Schäfer,et al.  Optimal multistage designs--a general framework for efficient genome-wide association studies. , 2009, Biostatistics.

[9]  Roger E Bumgarner,et al.  Sample size for detecting differentially expressed genes in microarray experiments , 2004, BMC Genomics.

[10]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[11]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[12]  Manolis Kellis,et al.  RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo , 2007, Nature Genetics.

[13]  Hui Xiao,et al.  Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes , 2009, Bioinform..

[14]  Philipp Kapranov,et al.  Genome-wide analysis of A-to-I RNA editing by single-molecule sequencing in Drosophila , 2013, Nature Structural &Molecular Biology.

[15]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[16]  K. Nishikura Functions and regulation of RNA editing by ADAR deaminases. , 2010, Annual review of biochemistry.

[17]  Fengzhu Sun,et al.  Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates , 2011, BMC Systems Biology.

[18]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[19]  R. W. Doerge,et al.  Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments , 2002, Bioinform..

[20]  D. W. Knowles,et al.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm , 2008, PLoS biology.

[21]  M. Marazita,et al.  Genome-wide Association Studies , 2012, Journal of dental research.

[22]  Robert Tibshirani,et al.  A simple method for assessing sample sizes in microarray experiments , 2006, BMC Bioinformatics.

[23]  Li Deng,et al.  Differential expression in SAGE: accounting for normal between-library variation , 2003, Bioinform..

[24]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[25]  Daniel MacArthur,et al.  Methods: Face up to false positives , 2012, Nature.

[26]  Tomas Babak,et al.  Critical Evaluation of Imprinted Gene Expression by RNA–Seq: A New Perspective , 2012, PLoS genetics.

[27]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[28]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[29]  Harma A. Karsens,et al.  A generally applicable validation scheme for the assessment of factors involved in reproducibility and quality of DNA-microarray data , 2005, BMC Genomics.

[30]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[32]  Mingyao Li,et al.  Widespread RNA and DNA Sequence Differences in the Human Transcriptome , 2011, Science.

[33]  S. Luo,et al.  High-Resolution Analysis of Parent-of-Origin Allelic Expression in the Mouse Brain , 2010, Science.

[34]  Yiannis A. Savva,et al.  Visualizing adenosine-to-inosine RNA editing in the Drosophila nervous system , 2011, Nature Methods.

[35]  Timothy R Hughes,et al.  'Validation' in genome-scale research , 2009, Journal of biology.

[36]  Christopher D. Brown,et al.  A Comprehensive Map of Insulator Elements for the Drosophila Genome , 2010, PLoS genetics.

[37]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[38]  Jonathan F. Russell,et al.  If a job is worth doing, it is worth doing twice , 2013, Nature.

[39]  Rainer Spang,et al.  Similarities of Ordered Gene Lists , 2006, J. Bioinform. Comput. Biol..

[40]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[41]  Brenda L. Bass,et al.  An unwinding activity that covalently modifies its double-stranded RNA substrate , 1988, Cell.

[42]  S. Celniker,et al.  RNA editing in Drosophila melanogaster: New targets and functional consequences. , 2006, RNA.

[43]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[44]  David L. Vaux,et al.  Research methods: Know when your numbers are significant , 2012, Nature.

[45]  R. Reenan,et al.  A-to-I Pre-mRNA Editing in Drosophila Is Primarily Involved in Adult Nervous System Function and Integrity , 2000, Cell.

[46]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..

[47]  Piero Carninci,et al.  Genome-wide analysis of promoter architecture in Drosophila melanogaster. , 2011, Genome research.

[48]  Tariq Ahmad,et al.  Rare and functional SIAE variants are not associated with autoimmune disease risk in up to 66,924 individuals of European ancestry , 2011, Nature Genetics.

[49]  Hongkai Ji,et al.  Analyzing 'omics data using hierarchical models , 2010, Nature Biotechnology.

[50]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[51]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[52]  Malcolm Macleod,et al.  Why animal research needs to improve , 2011, Nature.

[53]  Jacek Majewski,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[54]  John D. Storey A direct approach to false discovery rates , 2002 .

[55]  Jin Billy Li,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[56]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[57]  R. Reenan,et al.  Nervous System Targets of RNA Editing Identified by Comparative Genomics , 2003, Science.

[58]  Daniel J. Park,et al.  A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[59]  Annette Lee,et al.  Functionally defective germline variants of sialic acid acetylesterase in autoimmunity , 2010, Nature.