Bayesian estimation of genetic regulatory effects in high-throughput reporter assays

MOTIVATION High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences (BIRD), which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION Supplementary information is available online.

[1]  Maitreya J. Dunham,et al.  Variant Interpretation: Functional Assays to the Rescue. , 2017, American journal of human genetics.

[2]  Daniel J. Gaffney,et al.  Fine-mapping cellular QTLs with RASQUAL and ATAC-seq , 2015, Nature Genetics.

[3]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[4]  Yousin Suh,et al.  Challenges and progress in interpretation of non-coding genetic variants associated with human disease , 2017, Experimental biology and medicine.

[5]  Eric Banks,et al.  Tools and best practices for data processing in allelic expression analysis , 2015, Genome Biology.

[6]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[7]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[8]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[9]  Rong Chen,et al.  Human genomic disease variants : A neutral evolutionary explanation , 2012 .

[10]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[11]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[12]  Barak A. Cohen,et al.  Complex effects of nucleotide variants in a mammalian cis-regulatory element , 2012, Proceedings of the National Academy of Sciences.

[13]  D. Hartl,et al.  Principles of population genetics , 1981 .

[14]  Russell A. Wilke,et al.  Pharmacogenomics: The Genetics of Variable Drug Responses , 2011, Circulation.

[15]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[16]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[17]  Yuwen Liu,et al.  Systematic identification of regulatory variants associated with cancer risk , 2017, Genome Biology.

[18]  Timothy E. Reddy,et al.  The chromosome 3q25 genomic region is associated with measures of adiposity in newborns in a multi-ethnic genome-wide association study. , 2013, Human molecular genetics.

[19]  M. Gerstein,et al.  A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals , 2016, Nature Communications.

[20]  Richard Bonneau,et al.  FIREWACh: High-throughput Functional Detection of Transcriptional Regulatory Modules in Mammalian Cells , 2014, Nature Methods.

[21]  Mark Yandell,et al.  High‐throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE , 2016, Bioinform..

[22]  T. Mikkelsen,et al.  Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay , 2012, Nature biotechnology.

[23]  Christopher D. Brown,et al.  QuASAR‐MPRA: accurate allele‐specific analysis for massively parallel reporter assays , 2018, Bioinform..

[24]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25]  Ian C. McDowell,et al.  Transversions have larger regulatory effects than transitions , 2017, BMC Genomics.

[26]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[27]  Peter Donnelly,et al.  Progress and promise in understanding the genetic basis of common diseases , 2015, Proceedings of the Royal Society B: Biological Sciences.

[28]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[29]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[30]  Philippe Froguel,et al.  Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight , 2010, Nature Genetics.

[31]  M. Dolan,et al.  Relating human genetic variation to variation in drug responses. , 2012, Trends in genetics : TIG.

[32]  High-throughput characterization of genetic effects on DNA-protein binding and gene transcription. , 2018, Genome research.

[33]  Mark Yandell,et al.  Predicting gene structure changes resulting from genetic variants via exon definition features , 2018, Bioinform..

[34]  Jonathan M. Cairns,et al.  Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters , 2016, Cell.

[35]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[36]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[37]  B. Cohen,et al.  Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks , 2013, Proceedings of the National Academy of Sciences.

[38]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[39]  Timothy E. Reddy,et al.  Genomic approaches for understanding the genetics of complex disease , 2015, Genome research.

[40]  Len A. Pennacchio,et al.  Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease , 2016, Cell.

[41]  J. Cerhan,et al.  High-throughput screening of prostate cancer risk loci by single nucleotide polymorphisms sequencing , 2018, Nature Communications.

[42]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[43]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[44]  Christopher M. Vockley,et al.  Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort , 2015, Genome research.

[45]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[46]  Jonathan K. Pritchard,et al.  WASP: allele-specific software for robust molecular quantitative trait locus discovery , 2015, Nature Methods.