MPRAudit Quantifies the Fraction of Variance Described by Unknown Features in Massively Parallel Reporter Assays

Transformative advances in molecular technologies, such as massively parallel reporter assays (MPRAs) and CRISPR screens, can efficiently characterize the effects of genetic and genomic variation on cellular phenotypes. Analysis approaches to date have focused on identifying individual genomic regions or genetic variants that perturb a phenotype of interest. In this work, we develop a wholistic framework (MPRAudit) to determine the global contribution of sequence to phenotypic variation across subsets of the entire experiment, opening the door to myriad novel analyses. For example, MPRAudit can reliably estimate the upper limit of predictive performance, the fraction of variation attributed to specific biological categories, and the total contribution of experimental noise. We demonstrate through simulation and application to several types of real MPRA data sets how MPRAudit can lead to an improved understanding of experimental quality, molecular biology, and guide future research. Applying MPRAudit to real MPRA data, we observe that sequence variation is the primary driver of outcome variability, but that known biological categories explain only a fraction of this variance. We conclude that our understanding of how sequence variation impacts phenotype, even at the level of MPRAs, remains open to further scientific discovery.

[1]  Michael T. McManus,et al.  Tracing cellular heterogeneity in pooled genetic screens via multi-level barcoding , 2017, BMC Genomics.

[2]  Christopher P Austin,et al.  High-throughput screening assays for the identification of chemical probes. , 2007, Nature chemical biology.

[3]  Michael T. McManus,et al.  Genome-wide CRISPR screen identifies FAM49B as a key regulator of actin dynamics and T cell activation , 2018, Proceedings of the National Academy of Sciences.

[4]  B. Cohen,et al.  A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells , 2016, Nucleic acids research.

[5]  Hakho Lee,et al.  Genome-wide CRISPR Screen in a Mouse Model of Tumor Growth and Metastasis , 2015, Cell.

[6]  N. Ahituv,et al.  Decoding enhancers using massively parallel reporter assays. , 2015, Genomics.

[7]  Ming C. Hammond,et al.  Roquin Promotes Constitutive mRNA Decay via a Conserved Class of Stem-Loop Recognition Motifs , 2013, Cell.

[8]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[9]  Kasper Daniel Hansen,et al.  Linear models enable powerful differential activity analysis in massively parallel reporter assays , 2017 .

[10]  J. Shao,et al.  A General Theory for Jackknife Variance Estimation , 1989 .

[11]  Vanja Klepac-Ceraj,et al.  PCR-Induced Sequence Artifacts and Bias: Insights from Comparison of Two 16S rRNA Clone Libraries Constructed from the Same Sample , 2005, Applied and Environmental Microbiology.

[12]  M. Markowicz,et al.  Adaptation of High-Throughput Screening in Drug Discovery—Toxicological Screening Tests , 2011, International journal of molecular sciences.

[13]  Wendell A Lim,et al.  CRISPR/Cas9 for Human Genome Engineering and Disease Research. , 2016, Annual review of genomics and human genetics.

[14]  Iuliana Ionita-Laza,et al.  Sequence kernel association tests for the combined effect of rare and common variants. , 2013, American journal of human genetics.

[15]  B. Cohen,et al.  A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity , 2018, Nature Biotechnology.

[16]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[17]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[18]  Erik Meijer,et al.  Delete-m Jackknife for Unequal m , 1999, Stat. Comput..

[19]  Jay Shendure,et al.  Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution , 2019, Nature Communications.

[20]  E. Segal,et al.  Systematic interrogation of human promoters , 2019, Genome research.

[21]  Christopher D. Brown,et al.  QuASAR‐MPRA: accurate allele‐specific analysis for massively parallel reporter assays , 2018, Bioinform..

[22]  B. Cohen,et al.  A Simple Grammar Defines Activating and Repressing cis-Regulatory Elements in Photoreceptors. , 2016, Cell reports.

[23]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[24]  N. Zaitlen,et al.  Massively parallel analysis of human 3′ UTRs reveals that AU-rich element length and registration predict mRNA destabilization , 2020, bioRxiv.

[25]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[26]  Sharon R Grossman,et al.  Systematic dissection of genomic features determining transcription factor binding and enhancer function , 2017, Proceedings of the National Academy of Sciences.

[27]  G. Seelig,et al.  Human 5′ UTR design and variant effect prediction from a massively parallel translation assay , 2018, bioRxiv.

[28]  Michael T. McManus,et al.  Massively parallel functional annotation of 3' untranslated regions , 2014, Nature Biotechnology.

[29]  Ion I Măndoiu,et al.  Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates , 2014, BMC Genomics.

[30]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[31]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[32]  M. Rehmsmeier,et al.  Comprehensive analysis of high-throughput screens with HiTSeekR , 2016, Nucleic acids research.

[33]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[34]  Michael Wainberg,et al.  Predicting gene expression in massively parallel reporter assays: A comparative study , 2017, Human mutation.

[35]  Xiquan Shi A note on the delete-d jackknife variance estimators , 1988 .

[36]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[37]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[38]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[39]  A. Schier,et al.  A Massively Parallel Reporter Assay of 3' UTR Sequences Identifies In Vivo Rules for mRNA Degradation. , 2017, Molecular cell.

[40]  Mitsuo Iwadate,et al.  TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasis in non-small cell lung cancer , 2014, BMC Genomics.

[41]  Björn Wallner,et al.  Topology independent structural matching discovers novel templates for protein interfaces , 2017, bioRxiv.

[42]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .