GPA: A statistical approach to prioritizing GWAS results by integrating pleiotropy information and annotation data

Genome-wide association studies (GWAS) suggests that a complex disease is typically affected by many genetic variants with small or moderate effects. Identification of these risk variants remains to be a very challenging problem. Traditional approaches focusing on a single GWAS dataset alone ignore relevant information that could potentially improve our ability to detect these variants. We proposed a novel statistical approach, named GPA, to performing integrative analysis of multiple GWAS datasets and functional annotations. Hypothesis testing procedures were developed to facilitate statistical inference of pleiotropy and enrichment of functional annotation. We applied our approach to perform systematic analysis of five psychiatric disorders. Not only did GPA identify many weak signals missed by the original single phenotype analysis, but also revealed interesting genetic architectures of these disorders. We also applied GPA to the bladder cancer GWAS data with the ENCODE DNase-seq data from 125 cell lines and showed that GPA can detect cell lines that are more biologically relevant to the phenotype of interest.

[1]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.

[2]  Jianxin Shi,et al.  Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs , 2013, Nature Genetics.

[3]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[4]  Hongyu Zhao,et al.  Improving genetic risk prediction by leveraging pleiotropy , 2013, Human Genetics.

[5]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[6]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[7]  John S Witte,et al.  Turning of COGS moves forward findings for hormonally mediated cancers , 2013, Nature Genetics.

[8]  M. McCarthy,et al.  Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. , 2013, American journal of human genetics.

[9]  Manolis Kellis,et al.  Interpreting non-coding variation in complex disease genetics , 2012, Nature Biotechnology.

[10]  Sang Hong Lee,et al.  Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood , 2012, Bioinform..

[11]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[12]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[13]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[14]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[15]  Disorder Working Group Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4 , 2012, Nature Genetics.

[16]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[17]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[18]  Shashaank Vattikuti,et al.  Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits , 2012, PLoS genetics.

[19]  Stephan Ripke,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[20]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[21]  Daniel Shriner,et al.  Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies , 2011, Front. Gene..

[22]  F. Agakov,et al.  Abundant pleiotropy in human complex diseases and traits. , 2011, American journal of human genetics.

[23]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.

[24]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[25]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[26]  William Wheeler,et al.  A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci , 2010, Nature Genetics.

[27]  Joshua M. Korn,et al.  Accurately Assessing the Risk of Schizophrenia Conferred by Rare Copy-Number Variation Affecting Genes with Brain Function , 2010, PLoS genetics.

[28]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[29]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[30]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[31]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[32]  K. Lange,et al.  Prioritizing GWAS results: A review of statistical methods and recommendations for their application. , 2010, American journal of human genetics.

[33]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[34]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[35]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[36]  Peter M Visscher,et al.  Sizing up human height variation , 2008, Nature Genetics.

[37]  W. G. Hill,et al.  Heritability in the genomics era — concepts and misconceptions , 2008, Nature Reviews Genetics.

[38]  Yoav Benjamini,et al.  Comment: Microarrays, Empirical Bayes and the Two-Groups Model , 2008, 0808.0582.

[39]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[41]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[42]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[43]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .