GPA-Tree: statistical approach for functional-annotation-tree-guided prioritization of GWAS results

MOTIVATION In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework. RESULTS First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits. AVAILABILITY The GPATree software is available at https://dongjunchung.github.io/GPATree/. SUPPLEMENTARY INFORMATION Supplementary information is available at Bioinformatics online.

[1]  Xudong Liu,et al.  Association between Polymorphisms of the IKZF3 Gene and Systemic Lupus Erythematosus in a Chinese Han Population , 2014, PloS one.

[2]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[3]  U. Landmesser,et al.  Into the Wild: GWAS Exploration of Non-coding RNAs , 2018, Front. Cardiovasc. Med..

[4]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[5]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[6]  P. Ramos,et al.  Unravelling the complex genetic regulation of immune cells , 2020, Nature Reviews Rheumatology.

[7]  J. Moreau,et al.  Increase in activated CD8+ T lymphocytes expressing perforin and granzyme B correlates with disease activity in patients with systemic lupus erythematosus. , 2005, Arthritis and rheumatism.

[8]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[9]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[10]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[11]  Hongyu Zhao,et al.  graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture , 2017, PLoS Comput. Biol..

[12]  G. Tsokos,et al.  T cells as a therapeutic target in SLE , 2015, Lupus.

[13]  P. Gaffney,et al.  Identification of IRF8, TMEM39A, and IKZF3-ZPBP2 as susceptibility loci for systemic lupus erythematosus in a large-scale multiracial replication study. , 2012, American journal of human genetics.

[14]  G. Filaci,et al.  Impairment of CD8+ T Suppressor Cell Function in Patients with Active Systemic Lupus Erythematosus1 , 2001, The Journal of Immunology.

[15]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[16]  Gerald McGwin,et al.  A large-scale replication study identifies TNIP1, PRDM1, JAZF1, UHRF1BP1 and IL10 as risk loci for systemic lupus erythematosus , 2009, Nature Genetics.

[17]  Anders M. Dale,et al.  Covariate-modulated local false discovery rate for genome-wide association studies , 2014, Bioinform..

[18]  E. Ebert,et al.  Gastrointestinal and hepatic manifestations of systemic lupus erythematosus. , 2011, Journal of clinical gastroenterology.

[19]  I. Sanz,et al.  B cells as therapeutic targets in SLE , 2010, Nature Reviews Rheumatology.

[20]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[21]  W. A. Katz,et al.  The gastrointestinal manifestations of systemic lupus erythematosus: a review of the literature. , 1980, Seminars in arthritis and rheumatism.

[22]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[23]  Qian Wang,et al.  Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease , 2016, bioRxiv.

[24]  Hongyu Zhao,et al.  Improving SNP prioritization and pleiotropic architecture estimation by incorporating prior knowledge using graph-GPA , 2018, Bioinform..

[25]  L. Gesualdo,et al.  Local synthesis of interferon-alpha in lupus nephritis is associated with type I interferons signature and LMP7 induction in renal tubular epithelial cells , 2015, Arthritis Research & Therapy.

[26]  Peter Donnelly,et al.  Progress and promise in understanding the genetic basis of common diseases , 2015, Proceedings of the Royal Society B: Biological Sciences.

[27]  Jin Liu,et al.  LSMM: a statistical approach to integrating functional annotations with genome-wide association studies , 2017, Bioinform..

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  Mariana J. Kaplan,et al.  Neutrophils in the pathogenesis and manifestations of SLE , 2011, Nature Reviews Rheumatology.

[30]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[31]  V. Chan,et al.  Systemic Lupus Erythematosus Patients Exhibit Reduced Expression of CLEC16A Isoforms in Peripheral Leukocytes , 2015, International journal of molecular sciences.

[32]  P. Gregersen,et al.  Transancestral mapping and genetic load in systemic lupus erythematosus , 2017, Nature Communications.

[33]  Yong Cui,et al.  Genetic susceptibility to SLE: recent progress from GWAS. , 2013, Journal of autoimmunity.

[34]  Jingsi Ming,et al.  LPM: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations , 2018, bioRxiv.

[35]  Hongyu Zhao,et al.  GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation , 2016, Bioinform..