MPRAnalyze: statistical framework for massively parallel reporter assays

Massively parallel reporter assays (MPRAs) are a technique that enables testing thousands of regulatory DNA sequences and their variants in a single, quantitative experiment. Despite growing popularity, there is lack of statistical methods that account for the different sources of uncertainty inherent to these assays, thus effectively leveraging their promise. Development of such methods could help enhance our ability to identify regulatory sequences in the genome, understand their function under various setting, and ultimately gain a better understanding of how the regulatory code and its alteration lead to phenotypic consequence. Here we present MPRAnalyze: a statistical framework dedicated to analyzing MPRA count data. MPRAnalyze addresses the major questions that are posed in the context of MPRA experiments: estimating the magnitude of the effect of a regulatory sequence in a single condition setting, and comparing differential activity of regulatory sequences across multiple conditions. The framework uses a nested construction of generalized linear models to account for uncertainty in both DNA and RNA observations, controls for various sources of unwanted variation, and incorporates negative controls for robust hypothesis testing, thereby providing clear quantitative answers in complex experimental settings. We demonstrate the robustness, accuracy and applicability of MPR-Analyze on simulated data and published data sets and compare it against the existing analysis methodologies. MPRAnalyze is implemented as an R package and is publicly available through Bioconductor [1].

[1]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[2]  Nadav Ahituv,et al.  Minor Loops in Major Folds: Enhancer–Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease , 2015, PLoS genetics.

[3]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[4]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[5]  B. Cohen,et al.  High-throughput functional testing of ENCODE segmentation predictions , 2014, Genome research.

[6]  J. T. Erichsen,et al.  Enhancer Evolution across 20 Mammalian Species , 2015, Cell.

[7]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[8]  Michael J. Ziller,et al.  Transcription factor binding dynamics during human ESC differentiation , 2015, Nature.

[9]  Kasper Daniel Hansen,et al.  Linear models enable powerful differential activity analysis in massively parallel reporter assays , 2017 .

[10]  Christopher D. Brown,et al.  QuASAR‐MPRA: accurate allele‐specific analysis for massively parallel reporter assays , 2018, Bioinform..

[11]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[12]  A. Visel,et al.  Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. , 2010, Genome research.

[13]  Wen‐Teng Chang,et al.  A novel function of transcription factor α-Pal/NRF-1: Increasing neurite outgrowth , 2005 .

[14]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[15]  Wen‐Teng Chang,et al.  A novel function of transcription factor alpha-Pal/NRF-1: increasing neurite outgrowth. , 2005, Biochemical and biophysical research communications.

[16]  John G Flannery,et al.  Massively parallel cis-regulatory analysis in the mammalian central nervous system , 2016, Genome research.

[17]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[18]  Chengyu Liu,et al.  Transcription factor TEAD2 is involved in neural tube closure , 2007, Genesis.

[19]  Nir Yosef,et al.  Massively parallel characterization of regulatory dynamics during neural induction , 2018, bioRxiv.

[20]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[21]  Michael J. Ziller,et al.  Transcriptional and Epigenetic Dynamics during Specification of Human Embryonic Stem Cells , 2013, Cell.

[22]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[23]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[24]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.