CUDA-LR: CUDA-accelerated logistic regression analysis tool for gene-gene interaction for genome-wide association study

In genome-wide association studies (GWAS), logistic regression (LR) has been most commonly used for finding an association between a disease phenotype and genetic variants such as single nucleotide polymorphism (SNP). Since logistic regression model requires iterative algorithms to get the parameter estimates, its application to GWAS has been limited to the identification of the individual SNPs. Thus, there have been limited applications of LR to multiple SNP analysis including gene-gene interaction analysis in large scale GWAS data. To overcome this computational burden, we developed a logistic regression analysis tool named CUDA-LR, based on the new programming architecture using Graphics Processing Unit (GPU). CUDA-LR supports not only the simple model with single SNP but also more complex model with two SNPs including the interaction. In addition, CUDA-LR provides various parameters to gain more acceleration and perform specified analysis. In the comparison between our analysis and the other methods, CUDA-LR showed almost 700-folds of acceleration and highly reliable results by our GPU specified optimization techniques. We believe that the CUDA-LR now is a useful logistic regression analysis tool for interaction analysis of large scale GWAS datasets.

[1]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[2]  Fabio Cancare,et al.  Accelerating epistasis analysis in human genetics with consumer graphics hardware , 2009, BMC Research Notes.

[3]  Qiang Yang,et al.  Detecting two-locus associations allowing for interactions in genome-wide association studies , 2010, Bioinform..

[4]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[5]  Ying Wang,et al.  A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. , 2009, American journal of human genetics.

[6]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[7]  M. Boehnke,et al.  Transferability of Type 2 Diabetes Implicated Loci in Multi-Ethnic Cohorts from Southeast Asia , 2011, PLoS genetics.

[8]  Xue-wen Chen,et al.  A Fast Markov Blankets Method for Epistatic Interactions Detection in Genome-wide Association Studies , 2010 .

[9]  Robert C. Thompson,et al.  Genome-wide association and meta-analysis of bipolar disorder in individuals of European ancestry , 2009, Proceedings of the National Academy of Sciences.

[10]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[11]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[12]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[13]  Manuel A. R. Ferreira,et al.  Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder , 2008, Nature Genetics.

[14]  Julian Peto,et al.  A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3 , 2008, Nature Genetics.

[15]  L. Groop,et al.  Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus , 2008, Nature Genetics.

[16]  Deborah Hughes,et al.  Genome-wide association study identifies five new breast cancer susceptibility loci , 2010, Nature Genetics.

[17]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.