Using Mixtures of Biological Samples as Genome-Scale Process Controls 1 2

Genome-scale “-omics” measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements. The parameters of a model fit to a measured -omic profile can be evaluated to assess bias and variability of the genome-scale measurement of a mixture. A linear model describes the behavior of expression measures of mixtures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the messenger RNA (mRNA) content of the individual components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine mRNA content. Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The mRNA fraction accounts for differential enrichment of mRNA from varying total RNA samples. Spike-in controls can be utilized to measure this relationship between mRNA content and input total RNA. Analysis of mixtures can also be employed to determine the composition and proportions of an unknown sample, even when component-specific markers are not previously known, so long as pure components can be measured alongside the mixture.

[1]  Subhajyoti De,et al.  An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples , 2015, Briefings Bioinform..

[2]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[3]  Michael B. Black,et al.  IVT-seq reveals extreme bias in RNA sequencing , 2014, Genome Biology.

[4]  David P. Kreil,et al.  Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures , 2014, Nature Communications.

[5]  C. Thermes,et al.  Library preparation methods for next-generation sequencing: tone down the bias. , 2014, Experimental cell research.

[6]  Andreia J. Amaral,et al.  Quality assessment and control of tissue specific RNA-seq libraries of Drosophila transgenic RNAi models , 2014, Front. Genet..

[7]  Sinnakaruppan Mathavan,et al.  Normalization of RNA-Sequencing Data from Samples with Varying mRNA Levels , 2014, PloS one.

[8]  S. P. Fodor,et al.  Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations , 2014, Proceedings of the National Academy of Sciences.

[9]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[10]  Traver Hart,et al.  Finding the active genes in deep RNA-seq gene expression studies , 2013, BMC Genomics.

[11]  G. Getz,et al.  Inferring tumour purity and stromal and immune cell admixture from expression data , 2013, Nature Communications.

[12]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[13]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[14]  Ting Gong,et al.  DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data , 2013, Bioinform..

[15]  Yi Li,et al.  A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues , 2013, BMC Bioinformatics.

[16]  Quaid Morris,et al.  Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction , 2013, Genome Medicine.

[17]  Leming Shi,et al.  mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies , 2013, Science China Life Sciences.

[18]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[19]  David Haussler,et al.  ENCODE Data in the UCSC Genome Browser: year 5 update , 2012, Nucleic Acids Res..

[20]  R. O’Neill,et al.  Determination of dosage compensation of the mammalian X chromosome by RNA-seq is dependent on analytical approach , 2013, BMC Genomics.

[21]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[22]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[23]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[24]  C. Seoighe,et al.  Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[25]  Thomas Lengauer,et al.  Managing drug resistance in cancer: lessons from HIV therapy , 2012, Nature Reviews Cancer.

[26]  J. Szustakowski,et al.  Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples , 2011, PloS one.

[27]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.

[28]  Wei Liu,et al.  Sample preparation method for isolation of single‐cell types from mouse liver for proteomic studies , 2011, Proteomics.

[29]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[30]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[31]  S. Letovsky,et al.  Protocol Dependence of Sequencing-Based Gene Expression Measurements , 2011, PloS one.

[32]  P. S. Pine,et al.  An adaptable method using human mixed tissue ratiometric controls for benchmarking performance on gene expression microarrays in clinical laboratories , 2011, BMC biotechnology.

[33]  Yingdong Zhao,et al.  Gene expression deconvolution in clinical samples , 2010, Genome Medicine.

[34]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[35]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[36]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[37]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[38]  Xue-Jun Guo,et al.  [The epigenetics in asthma]. , 2009, Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he huxi zazhi = Chinese journal of tuberculosis and respiratory diseases.

[39]  Gerald T. Quon,et al.  ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing , 2009, Bioinform..

[40]  Marc Salit,et al.  Learning from microarray interlaboratory studies: measures of precision for gene expression , 2009, BMC Genomics.

[41]  Leming Shi,et al.  Using RNA sample titrations to assess microarray platform performance and normalization techniques , 2006, Nature Biotechnology.

[42]  J. Thierry-Mieg,et al.  AceView: a comprehensive cDNA-supported gene and transcripts annotation , 2006, Genome Biology.

[43]  James C. Fuscoe,et al.  Use of a mixed tissue RNA design for performance assessments on multiple microarray formats , 2005, Nucleic acids research.

[44]  L. Reid,et al.  Proposed methods for testing and selecting the ERCC external RNA controls , 2005, BMC Genomics.

[45]  Kathleen F. Kerr,et al.  The External RNA Controls Consortium: a progress report , 2005, Nature Methods.