A Bayesian Method to Incorporate Hundreds of Functional Characteristics with Association Evidence to Improve Variant Prioritization

The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals (“hits”) to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data.

[1]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[2]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[3]  M. Brown,et al.  Promise and pitfalls of the Immunochip , 2011, Arthritis research & therapy.

[4]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[8]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[9]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[10]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[11]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[12]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[13]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[14]  Washington Seattle An integrated encyclopedia of DNA elements in the human genome , 2016 .

[15]  Matti Pirinen,et al.  A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1 , 2010, Nature Genetics.

[16]  Nicholas R. Lemoine,et al.  SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update) , 2012, Nucleic Acids Res..

[17]  C. Liew,et al.  "Chip"ping away at heart failure. , 2006, Methods in molecular medicine.

[18]  Manuel A. R. Ferreira,et al.  GENOVA: Gene Overlap Analysis of GWAS Results , 2012, Statistical applications in genetics and molecular biology.

[19]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[20]  Chris S Haley,et al.  The genomic signature of trait-associated variants , 2013, BMC Genomics.

[21]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[22]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[23]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[24]  Matti Pirinen,et al.  Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity , 2012 .

[25]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[26]  Manolis Kellis,et al.  HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants , 2011, Nucleic Acids Res..

[27]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[30]  John D. Storey,et al.  Mapping the Genetic Architecture of Gene Expression in Human Liver , 2008, PLoS biology.

[31]  Eleazar Eskin,et al.  Incorporating prior information into association studies , 2012, Bioinform..

[32]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[33]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[34]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[35]  S. Batzoglou,et al.  Linking disease associations with regulatory information in the human genome , 2012, Genome research.

[36]  Gerome Breen,et al.  Using Functional Annotation for the Empirical Determination of Bayes Factors for Genome-Wide Association Study Analysis , 2011, PloS one.

[37]  A. Ramasamy,et al.  Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies , 2011, Journal of neurochemistry.

[38]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[39]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[40]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[41]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.