Tuning Hyperparameters for Gene Interaction Models in Genome-Wide Association Studies

In genetic epidemiology, epistasis has been the subject of several researchers to understand the underlying causes of complex diseases. Identifying gene-gene and/or gene-environmental interactions are becoming more challenging due to multiple genetic and environmental factors acting together or independently. The limitations of current computational approaches motivated the development of a deep learning method in our recent study. The approach trained a multilayered feedforward neural network to discover interacting genes associated with complex diseases. The models are evaluated under various simulated scenarios and compared with the previous methods. The results showed significant improvements in predicting gene interactions over the traditional machine learning techniques. This study is further extended to maximize the predictive performance of the method by tuning the hyperparameters using Cartesian grid and random grid searching. Several experiments are conducted on real datasets to identify higher-order interacting genes responsible for diseases. The findings demonstrated randomly chosen trials are more efficient than trials chosen by grid search for optimizing hyperparameters. The optimal configuration of hyperparameter values improved the model performance without overfitting. The results illustrate top 30 gene interactions responsible for sporadic breast cancer and hypertension.

[1]  Y. Tseng,et al.  Molecular variant M235T of the angiotensinogen gene is associated with essential hypertension in Taiwanese , 1997, Journal of hypertension.

[2]  Skylar W. Marvel,et al.  Grammatical evolution support vector machines for predicting human genetic disease association , 2012, GECCO '12.

[3]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[4]  Marylyn D. Ritchie,et al.  GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease , 2006, BMC Bioinformatics.

[5]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[6]  Suneetha Uppu,et al.  A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[9]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[10]  M. L. Calle,et al.  Improving strategies for detecting genetic patterns of disease susceptibility in association studies , 2008, Statistics in medicine.

[11]  Suneetha Uppu,et al.  A Deep Learning Approach to Detect SNP Interactions , 2016, J. Softw..

[12]  Suneetha Uppu,et al.  Improving Strategy for Discovering Interacting Genetic Variants in Association Studies , 2016, ICONIP.

[13]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[14]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[15]  Kristel Van Steen,et al.  MB-MDR: Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data , 2008 .

[16]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[17]  Kristel Van Steen,et al.  Genome-wide association interaction analysis for Alzheimer's disease , 2014, Neurobiology of Aging.

[18]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[19]  L. Lai,et al.  Three single-nucleotide polymorphisms of the angiotensinogen gene and susceptibility to hypertension: single locus genotype vs. haplotype analysis. , 2004, Physiological genomics.

[20]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.