AuPairWise: A Method to Estimate RNA-Seq Replicability through Co-expression

In addition to detecting novel transcripts and higher dynamic range, a principal claim for RNA-sequencing has been greater replicability, typically measured in sample-sample correlations of gene expression levels. Through a re-analysis of ENCODE data, we show that replicability of transcript abundances will provide misleading estimates of the replicability of conditional variation in transcript abundances (i.e., most expression experiments). Heuristics which implicitly address this problem have emerged in quality control measures to obtain ‘good’ differential expression results. However, these methods involve strict filters such as discarding low expressing genes or using technical replicates to remove discordant transcripts, and are costly or simply ad hoc. As an alternative, we model gene-level replicability of differential activity using co-expressing genes. We find that sets of housekeeping interactions provide a sensitive means of estimating the replicability of expression changes, where the co-expressing pair can be regarded as pseudo-replicates of one another. We model the effects of noise that perturbs a gene’s expression within its usual distribution of values and show that perturbing expression by only 5% within that range is readily detectable (AUROC~0.73). We have made our method available as a set of easily implemented R scripts.

[1]  Susan R. Wilson,et al.  Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing , 2012, BMC Genomics.

[2]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[3]  Thomas Ragg,et al.  The RIN: an RNA integrity number for assigning integrity values to RNA measurements , 2006, BMC Molecular Biology.

[4]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[5]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[6]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[7]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Xiang Wan,et al.  Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data , 2012, Bioinform..

[9]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[10]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[13]  Xiaoli Li,et al.  Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery , 2011, BMC Bioinformatics.

[14]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[15]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[16]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[17]  Luming Qu,et al.  Development of ERCC RNA Spike-In Control Mixes , 2011 .

[18]  Wei Shi,et al.  Detecting and correcting systematic variation in large-scale RNA sequencing data , 2014, Nature Biotechnology.

[19]  John D. Storey,et al.  Human transcriptome array for high-throughput clinical studies , 2011, Proceedings of the National Academy of Sciences.

[20]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[21]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[22]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[23]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[26]  Sheng Li,et al.  Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study , 2014, Nature Biotechnology.

[27]  Shyr Yu,et al.  Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data , 2013, BMC Bioinformatics.

[28]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[29]  Arvind Ramanathan,et al.  ORBiT: Oak Ridge biosurveillance toolkit for public health dynamics , 2015, BMC Bioinformatics.

[30]  Terence P. Speed,et al.  Systematic noise degrades gene co-expression signals but can be corrected , 2015, BMC Bioinformatics.

[31]  Naftali Kaminski,et al.  MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis , 2011, Nucleic acids research.

[32]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[33]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[34]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[35]  Mark Gerstein,et al.  RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries , 2010, Bioinform..

[36]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[37]  Avi Ma'ayan,et al.  Lean Big Data integration in systems biology and systems pharmacology. , 2014, Trends in pharmacological sciences.

[38]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[39]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[40]  K. Hansen,et al.  Sequencing technology does not eliminate biological variability , 2011, Nature Biotechnology.

[41]  T. Therneau,et al.  SERE: Single-parameter quality control and sample comparison for RNA-Seq , 2012, BMC Genomics.

[42]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[43]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[44]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..