High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies

Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.

[1]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[2]  Li Ma,et al.  Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies , 2008, BMC Bioinformatics.

[3]  Jose C Florez,et al.  Introduction to genetic association studies. , 2007, The Journal of investigative dermatology.

[4]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[5]  K. Lange,et al.  Prioritizing GWAS results: A review of statistical methods and recommendations for their application. , 2010, American journal of human genetics.

[6]  George Casella,et al.  Assessing Robustness of Intrinsic Tests of Independence in Two-Way Contingency Tables , 2009 .

[7]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[8]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[9]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[10]  Karsten M. Borgwardt,et al.  Epistasis detection on quantitative phenotypes by exhaustive enumeration using GPUs , 2011, Bioinform..

[11]  Vineet Bafna,et al.  RAPID detection of gene-gene interactions in genome-wide association studies , 2010, Bioinform..

[12]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[13]  Jason H Moore,et al.  Analysis of Gene‐Gene Interactions , 2003, Current protocols in human genetics.

[14]  A. Agresti An introduction to categorical data analysis , 1997 .

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[17]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[18]  Attila Gyenesei,et al.  High-throughput analysis of epistasis in genome-wide association studies with BiForce , 2012, Bioinform..

[19]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[20]  Yongmei Liu,et al.  A ground truth based comparative study on detecting epistatic SNPs , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[21]  Divyakant Agrawal,et al.  eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study , 2011, Bioinform..

[22]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[23]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[24]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[25]  Cheng Soon Ong,et al.  GWIS - model-free, fast and exhaustive search for epistatic interactions in case-control GWAS , 2013, BMC Genomics.

[26]  N. Dracopoli,et al.  Current protocols in human genetics , 1994 .

[27]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[28]  Ryan J Urbanowicz,et al.  Analysis of Gene‐Gene Interactions , 2003, Current protocols in human genetics.

[29]  Carlos N. Bouza,et al.  Agresti,A.. An introduction to categorical data analysis, Wiley, xi-290 p , 2000 .

[30]  Enes Makalic,et al.  Supercomputing enabling exhaustive statistical analysis of genome wide association study data: Preliminary results , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[31]  Yang Liu,et al.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases , 2011, PLoS genetics.