Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and the important interactions between them. Basic statistical methods and techniques for the analysis of genetic SNP data from population-based genome-wide studies have been considered. Statistical association testing between individual SNPs and obesity was conducted under an additive model using Iogistic regression. Four subsets of loci after quality-control (QC) and association analysis were selected: P- values lower than l×10-5 (5 SNPs), l×10-4 (32 SNPs), l×10-3 (248 SNPs) and l×10-2 (2465 SNPs). A deep learning classifier is initialised using these sets of SNPs and fine-tuned to classify obese and non-obese observations. Using a deep learning classifier model and genetic variants with P-value < l×10-2(2465 SNPs) it was possible to obtain results (SE=0.9604, SP=0.9712, Gini=0.9817, LogLoss=0.1150, AUC=0.9908 and MSE=0.0300). As the P-value increased, an evident deterioration in performance was observed. Results demonstrate that single SNP analysis fails to capture the cumulative effect of less significant variants and their overall contribution to the outcome in disease prediction, which is captured using a deep learning framework.

[1]  L. Borrell,et al.  Body mass index categories and mortality risk in US adults: the effect of overweight and obesity on advancing death. , 2014, American journal of public health.

[2]  Peter Scarborough,et al.  The economic burden of ill health due to diet, physical inactivity, smoking, alcohol and obesity in the UK: an update to 2006-07 NHS costs. , 2011, Journal of public health.

[3]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[4]  Wen-Harn Pan,et al.  A Genome-Wide Association Study Reveals a Quantitative Trait Locus of Adiponectin on CDH13 That Predicts Cardiometabolic Outcomes , 2011, Diabetes.

[5]  P. J. Larsen,et al.  [The genetics of obesity]. , 2006, Ugeskrift for laeger.

[6]  Wentian Li,et al.  Three lectures on case-control genetic association analysis , 2007, Briefings Bioinform..

[7]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[8]  Obert,et al.  PREDICTING OBESITY IN YOUNG ADULTHOOD FROM CHILDHOOD AND PARENTAL OBESITY , 2000 .

[9]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[10]  Quan Long,et al.  Detecting disease-associated genotype patterns , 2009, BMC Bioinformatics.

[11]  Dhiya Al-Jumeily,et al.  Evaluation of Phenotype Classification Methods for Obesity Using Direct to Consumer Genetic Data , 2017, ICIC.

[12]  Andrew P Morris,et al.  Basic statistical analysis in genetic case-control studies , 2011, Nature Protocols.

[13]  Jennifer R Harris,et al.  Sex differences in heritability of BMI: a comparative study of results from twin studies in eight countries. , 2003, Twin research : the official journal of the International Society for Twin Studies.

[14]  Erik Ingelsson,et al.  Genome-wide association studies of obesity and metabolic syndrome , 2014, Molecular and Cellular Endocrinology.

[15]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[16]  J. Ott,et al.  Selecting SNPs in two‐stage analysis of disease association data: a model‐free approach , 2000, Annals of human genetics.

[17]  Kaare Christensen,et al.  Total and regional fat distribution is strongly influenced by genetic factors in young and elderly twins. , 2005, Obesity research.

[18]  Paul Fergus,et al.  Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women , 2018 .

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Paul Fergus,et al.  Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[22]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[23]  Felix Gutzwiller,et al.  Current challenges in handling genetic data. , 2014, Swiss medical weekly.

[24]  Yu Xue,et al.  Research on denoising sparse autoencoder , 2016, International Journal of Machine Learning and Cybernetics.

[25]  David M. Reif,et al.  Machine Learning for Detecting Gene-Gene Interactions , 2006, Applied bioinformatics.

[26]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[27]  Kong Y Chen,et al.  Redefining the roles of sensors in objective physical activity monitoring. , 2012, Medicine and science in sports and exercise.

[28]  D. Ledbetter,et al.  The Geisinger MyCode Community Health Initiative: an electronic health record-linked biobank for Precision Medicine research , 2015, Genetics in Medicine.