Improving Strategy for Discovering Interacting Genetic Variants in Association Studies

Revealing the underlying complex architecture of human diseases has received considerable attention since the exploration of genotype-phenotype relationships in genetic epidemiology. Identification of these relationships becomes more challenging due to multiple factors acting together or independently. A deep neural network was trained in the previous work to identify two-locus interacting single nucleotide polymorphisms (SNPs) related to a complex disease. The model was assessed for all two-locus combinations under various simulated scenarios. The results showed significant improvements in predicting SNP-SNP interactions over the existing conventional machine learning techniques. Furthermore, the findings are confirmed on a published dataset. However, the performance of the proposed method in the higher-order interactions was unknown. The objective of this study is to validate the model for the higher-order interactions in high-dimensional data. The proposed method is further extended for unsupervised learning. A number of experiments were performed on the simulated datasets under same scenarios as well as a real dataset to show the performance of the extended model. On an average, the results illustrate improved performance over the previous methods. The model is further evaluated on a sporadic breast cancer dataset to identify higher-order interactions between SNPs. The results rank top 20 higher-order SNP interactions responsible for sporadic breast cancer.

[1]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[2]  Suneetha Uppu,et al.  A Deep Learning Approach to Detect SNP Interactions , 2016, J. Softw..

[3]  Rui Jiang,et al.  A random forest approach to the detection of epistatic interactions in case-control studies , 2009, BMC Bioinformatics.

[4]  Suneetha Uppu,et al.  Detecting SNP Interactions in Balanced and Imbalanced Datasets using Associative Classification , 2014, Aust. J. Intell. Inf. Process. Syst..

[5]  G. Rocheleau,et al.  A survey about methods dedicated to epistasis detection , 2015, Front. Genet..

[6]  Kerrie L. Mengersen,et al.  Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  William Shannon,et al.  Detecting epistatic interactions contributing to quantitative traits , 2004, Genetic epidemiology.

[10]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  Xue-wen Chen,et al.  bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies , 2011, BMC Genomics.

[13]  Asako Koike,et al.  SNPInterForest: A new method for detecting epistatic interactions , 2011, BMC Bioinformatics.

[14]  Jörg Fliege,et al.  Machine learning approaches for the discovery of gene-gene interactions in disease data , 2013, Briefings Bioinform..

[15]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[16]  M. Ng,et al.  SNP Selection and Classification of Genome-Wide SNP Data Using Stratified Sampling Random Forests , 2012, IEEE Transactions on NanoBioscience.

[17]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[18]  Suneetha Uppu,et al.  Towards Deep Learning in genome-Wide Association Interaction studies , 2016, PACIS.

[19]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[20]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[21]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[22]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[23]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[24]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.