Encore: Genetic Association Interaction Network Centrality Pipeline and Application to SLE Exome Data

Open source tools are needed to facilitate the construction, analysis, and visualization of gene‐gene interaction networks for sequencing data. To address this need, we present Encore, an open source network analysis pipeline for genome‐wide association studies and rare variant data. Encore constructs Genetic Association Interaction Networks or epistasis networks using two optional approaches: our previous information‐theory method or a generalized linear model approach. Additionally, Encore includes multiple data filtering options, including Random Forest/Random Jungle for main effect enrichment and Evaporative Cooling and Relief‐F filters for enrichment of interaction effects. Encore implements SNPrank network centrality for identifying susceptibility hubs (nodes containing a large amount of disease susceptibility information through the combination of multivariate main effects and multiple gene‐gene interactions in the network), and it provides appropriate files for interactive visualization of a network using tools from our online Galaxy instance. We implemented these algorithms in C++ using OpenMP for shared‐memory parallel analysis on a server or desktop. To demonstrate Encore's utility in analysis of genetic sequencing data, we present an analysis of exome resequencing data from healthy individuals and those with Systemic Lupus Erythematous (SLE). Our results verify the importance of the previously associated SLE genes HLA‐DRB and NCF2, and these two genes had the highest gene‐gene interaction degrees among the susceptibility hubs. An additional 14 genes previously associated with SLE emerged in our epistasis network model of the exome data, and three novel candidate genes, ST8SIA4, CMTM4, and C2CD4B, were implicated in the model. In summary, we present a comprehensive tool for epistasis network analysis and the first such analysis of exome data from a genetic study of SLE.

[1]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[2]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[3]  M. Jolly,et al.  Genetic variation at the IRF7/PHRF1 locus is associated with autoantibody profile and serum interferon-alpha activity in lupus patients. , 2010, Arthritis and rheumatism.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  B. McKinney,et al.  Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis , 2009, PLoS genetics.

[6]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[7]  Brett A. McKinney,et al.  Real-world comparison of CPU and GPU implementations of SNPrank: a network analysis tool for GWAS , 2011, Bioinform..

[8]  A. Syvänen,et al.  Association of NCF2, IKZF1, IRF8, IFIH1, and TYK2 with Systemic Lupus Erythematosus , 2011, PLoS genetics.

[9]  Nicholas M. Pajewski,et al.  Six Degrees of Epistasis: Statistical Network Models for GWAS , 2011, Front. Gene..

[10]  Steven R. Boye,et al.  The application program interface , 1987 .

[11]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Ting Hu,et al.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks , 2011, BMC Bioinformatics.

[14]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[15]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[16]  M. Busslinger Transcriptional control of early B cell development. , 2004, Annual review of immunology.

[17]  V. Pankratz,et al.  Genome-wide association study of antibody response to smallpox vaccine. , 2012, Vaccine.

[18]  B A McKinney,et al.  Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder , 2012, Translational Psychiatry.

[19]  R. Sitia,et al.  Redox remodeling allows and controls B-cell activation and differentiation. , 2010, Antioxidants & redox signaling.

[20]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[21]  Robert C Elston,et al.  The genetic basis of complex traits: rare variants or "common gene, common disease"? , 2007, Methods in molecular biology.

[22]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[23]  B A McKinney,et al.  Surfing a genetic association interaction network to identify modulators of antibody response to smallpox vaccine , 2010, Genes and Immunity.

[24]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  G. Deng,et al.  Association of TBX21 gene haplotypes in a Chinese population with systemic lupus erythematosus , 2010, Scandinavian journal of rheumatology.

[27]  Philip Rosenstiel,et al.  Genome-wide association study for Crohn's disease in the Quebec Founder Population identifies multiple validated disease loci , 2007, Proceedings of the National Academy of Sciences.

[28]  A. Artola,et al.  The polysialic acid modification of the neural cell adhesion molecule is involved in spatial learning and hippocampal long‐term potentiation , 1996, Journal of neuroscience research.

[29]  W. Gold,et al.  A novel gene family induced by acute inflammation in endothelial cells. , 2004, Gene.

[30]  G. Klein,et al.  Changes in chemokines and chemokine receptor expression on tonsillar B cells upon Epstein–Barr virus infection , 2009, Immunology.

[31]  Yuka Kanno,et al.  Signaling by IL‐12 and IL‐23 and the immunoregulatory roles of STAT4 , 2004, Immunological reviews.

[32]  Jay K. Nathan,et al.  Polysialic Acid, a Glycan with Highly Restricted Expression, Is Found on Human and Murine Leukocytes and Modulates Immune Responses1 , 2008, The Journal of Immunology.