Linear models enable powerful differential activity analysis in massively parallel reporter assays

BackgroundMassively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets.ResultsWe present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. An R package is available from the Bioconductor project.ConclusionsTogether, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments.

[1]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[2]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[3]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[4]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[5]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[6]  Barak A. Cohen,et al.  Complex effects of nucleotide variants in a mammalian cis-regulatory element , 2012, Proceedings of the National Academy of Sciences.

[7]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[8]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[9]  T. Mikkelsen,et al.  Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay , 2012, Nature Biotechnology.

[10]  B. Cohen,et al.  Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks , 2013, Proceedings of the National Academy of Sciences.

[11]  B. Cohen,et al.  Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants , 2013, Genome research.

[12]  Martha L. Bulyk,et al.  Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos , 2013, Nature Methods.

[13]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[14]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[15]  Belinda Phipson Empirical bayes modelling of expression profiles and their associations , 2013 .

[16]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[17]  J. Shendure,et al.  Systematic Dissection of Coding Exons at Single Nucleotide Resolution Supports an Additional Role in Cell-Specific Transcriptional Regulation , 2014, PLoS genetics.

[18]  Michael T. McManus,et al.  Massively parallel functional annotation of 3' untranslated regions , 2014, Nature Biotechnology.

[19]  T. Mikkelsen,et al.  Massively Parallel Reporter Assays in Cultured Mammalian Cells , 2014, Journal of visualized experiments : JoVE.

[20]  Wei Zhang,et al.  Suboptimization of developmental enhancers , 2015, Science.

[21]  Michael A. White Understanding how cis-regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences. , 2015, Genomics.

[22]  Eric A. Johnson,et al.  Genome-wide identification of hypoxia-induced enhancer regions , 2015, PeerJ.

[23]  William H. Majoros,et al.  Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort , 2015, Genome research.

[24]  T. Mikkelsen,et al.  Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions , 2016, Nature Biotechnology.

[25]  B. Cohen,et al.  A Simple Grammar Defines Activating and Repressing cis-Regulatory Elements in Photoreceptors. , 2016, Cell reports.

[26]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[27]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[28]  John G Flannery,et al.  Massively parallel cis-regulatory analysis in the mammalian central nervous system , 2016, Genome research.

[29]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[30]  Sharon R Grossman,et al.  Systematic dissection of genomic features determining transcription factor binding and enhancer function , 2017, Proceedings of the National Academy of Sciences.

[31]  B. Cohen,et al.  A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells , 2016, Nucleic acids research.

[32]  Christopher D. Brown,et al.  QuASAR-MPRA: Accurate allele-specific analysis for massively parallel reporter assays , 2017, bioRxiv.

[33]  Christopher D. Brown,et al.  QuASAR‐MPRA: accurate allele‐specific analysis for massively parallel reporter assays , 2018, Bioinform..

[34]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[35]  Fabian J. Theis,et al.  MPRAnalyze: statistical framework for massively parallel reporter assays , 2019, Genome Biology.

[36]  Kasper D Hansen,et al.  A screen of 1,049 schizophrenia and 30 Alzheimer's‐associated variants for regulatory potential , 2019, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.