Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures

There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.

[1]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[2]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[3]  David Botstein,et al.  BMC Genomics BioMed Central Methodology article Universal Reference RNA as a standard for microarray experiments , 2004 .

[4]  P. Kemmeren,et al.  Monitoring global messenger RNA changes in externally controlled microarray experiments , 2003, EMBO reports.

[5]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[6]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[7]  Kathleen F. Kerr,et al.  The External RNA Controls Consortium: a progress report , 2005, Nature Methods.

[8]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[9]  Weida Tong,et al.  Evaluation of external RNA controls for the assessment of microarray performance , 2006, Nature Biotechnology.

[10]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[11]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[12]  Leming Shi,et al.  Using RNA sample titrations to assess microarray platform performance and normalization techniques , 2006, Nature Biotechnology.

[13]  Marc Salit,et al.  Standards in gene expression microarray experiments. , 2006, Methods in enzymology.

[14]  P S Pine,et al.  Use of diagnostic accuracy as a metric for evaluating laboratory proficiency with microarray assays using mixed-tissue RNA reference samples. , 2008, Pharmacogenomics.

[15]  R. Irizarry,et al.  Consolidated strategy for the analysis of microarray spike-in data , 2008, Nucleic acids research.

[16]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[17]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[18]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[19]  Ivo L. Hofacker,et al.  Hybridization thermodynamics of NimbleGen Microarrays , 2010, BMC Bioinformatics.

[20]  Jörg Rahnenführer,et al.  Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2009 .

[21]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[22]  M. Salit,et al.  Exploring the use of internal and externalcontrols for assessing microarray technical performance , 2010, BMC Research Notes.

[23]  Peter F. Stadler,et al.  G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration , 2010, BMC Bioinformatics.

[24]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[25]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[26]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[27]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[28]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[29]  Steven P Lund,et al.  Statistical Applications in Genetics and Molecular Biology Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates , 2012 .

[30]  Richard M Myers,et al.  Transposase mediated construction of RNA-seq libraries. , 2012, Genome research.

[31]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[32]  Peter A. Flach,et al.  Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them) , 2012, Briefings Bioinform..

[33]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[34]  Aviv Regev,et al.  Corrigendum: Comparative analysis of RNA sequencing methods for degraded or low-input samples , 2013, Nature Methods.

[35]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[36]  Sheng Li,et al.  Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study , 2014, Nature Biotechnology.

[37]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  Wei Shi,et al.  Detecting and correcting systematic variation in large-scale RNA sequencing data , 2014, Nature Biotechnology.

[40]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..