Evaluation of Parameter Contribution to Neural Network Size and Fitness in ATHENA for Genetic Analysis

The vast amount of available genomics data provides us an unprecedented ability to survey the entire genome and search for the genetic determinants of complex diseases. Until now, Genome-wide association studies have been the predominant method to associate DNA variations to disease traits. GWAS have successfully uncovered many genetic variants associated with complex diseases when the effect loci are strongly associated with the trait. However, methods for studying interaction effects among multiple loci are still lacking. Established machine learning methods such as the grammatical evolution neural networks (GENN) can be adapted to help us uncover the missing interaction effects that are not captured by GWAS studies. We used an implementation of GENN distributed in the software package ATHENA (Analysis Tool for Heritable and Environmental Network Associations) to investigate the effects of multiple GENN parameters and data noise levels on model detection and network structure. We concluded that the models produced by GENN were greatly affected by algorithm parameters and data noise levels. We also produced complex, multi-layer networks that were not produced in the previous study. In summary, GENN can produce complex, multi-layered networks when the data require it for higher fitness and when the parameter settings allow for a wide search of the complex model space.

[1]  Marylyn D. Ritchie,et al.  Grammatical Evolution of Neural Networks for Discovering Epistasis among Quantitative Trait Loci , 2010, EvoBIO.

[2]  Jason H. Moore,et al.  Genetic Programming Theory and Practice X , 2013, Genetic and Evolutionary Computation.

[3]  John R. Koza,et al.  Genetic generation of both the weights and architecture for a neural network , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Greg Gibson,et al.  Extensive Sex-Specific Nonadditivity of Gene Expression in Drosophila melanogaster , 2004, Genetics.

[7]  Brooke L. Fridley,et al.  Meta-Dimensional Analysis of Phenotypes Using the Analysis Tool for Heritable and Environmental Network Associations (ATHENA): Challenges with Building Large Networks , 2013 .

[8]  Vladimir Privman,et al.  Realization and properties of biochemical-computing biocatalytic XOR gate based on signal change. , 2010, The journal of physical chemistry. B.

[9]  David M. Skapura,et al.  Building neural networks , 1995 .

[10]  Conor Ryan,et al.  Grammatical evolution , 2001, IEEE Trans. Evol. Comput..

[11]  Marylyn D. Ritchie,et al.  Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA , 2008, EvoBIO.

[12]  Marylyn D. Ritchie,et al.  GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease , 2006, BMC Bioinformatics.

[13]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[14]  Laurie J. Heyer,et al.  Bacterial Hash Function Using DNA-Based XOR Logic Reveals Unexpected Behavior of the LuxR Promoter , 2011 .

[15]  Conor Ryan,et al.  Survey Of Evolutionary Automatic Programming , 2003 .

[16]  Jian Gu,et al.  HSD3B and Gene-Gene Interactions in a Pathway-Based Analysis of Genetic Susceptibility to Bladder Cancer , 2012, PloS one.