Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories

RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

[1]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[2]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[3]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[4]  Derek Gordon,et al.  Consequences of error , 2006 .

[5]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[6]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[7]  Walter A. Kosters,et al.  Metrics for Mining Multisets , 2007, SGAI Conf..

[8]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[9]  R. Vossen,et al.  Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms , 2008, Nucleic acids research.

[10]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[11]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[12]  Mihaela Zavolan,et al.  Computational analysis of small RNA cloning data. , 2008, Methods.

[13]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[14]  M. van Iterson,et al.  Relative power and sample size analysis on gene expression profiling data , 2009, BMC Genomics.

[15]  M. Gerstein,et al.  Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays , 2010, BMC Genomics.

[16]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[17]  Crispin J. Miller,et al.  A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling , 2010, BMC Genomics.

[18]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[19]  Xavier Estivill,et al.  SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells , 2009, Nucleic acids research.

[20]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[21]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[22]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[23]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[24]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[25]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[26]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[27]  D. Levy,et al.  A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease , 2012, BMC Medical Genomics.

[28]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[29]  Dan Wang,et al.  A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species , 2010, Nucleic Acids Res..

[30]  Daniel Bottomly,et al.  Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays , 2011, PloS one.

[31]  Kui Zhang,et al.  Length bias correction for RNA-seq data in gene set analyses , 2011, Bioinform..

[32]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[33]  Ritsert C. Jansen,et al.  MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects , 2011, Bioinform..

[34]  Jingyuan Fu,et al.  Trans-eQTLs Reveal That Independent Genetic Variants Associated with a Complex Phenotype Converge on Intermediate Genes, with a Major Role for the HLA , 2011, PLoS genetics.

[35]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[36]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[37]  Anton J. Enright,et al.  Extent, Causes, and Consequences of Small RNA Expression Variation in Human Adipose Tissue , 2012, PLoS genetics.

[38]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[39]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[40]  Heather J. Ruskin,et al.  RNA-Seq vs Dual- and Single-Channel Microarray Data: Sensitivity Analysis for Differential Expression and Clustering , 2012, PloS one.

[41]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[42]  Jun Chen,et al.  A tool for RNA sequencing sample identity check , 2013, Bioinform..

[43]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.