Optimal Use of Biological Expert Knowledge from Literature Mining in Ant Colony Optimization for Analysis of Epistasis in Human Disease

The fast measurement of millions of sequence variations across the genome is possible with the current technology. As a result, a difficult challenge arise in bioinformatics: the identification of combinations of interacting DNA sequence variations predictive of common disease [1]. The Multifactor Dimensionality Reduction (MDR) method is capable of analysing such interactions but an exhaustive MDR search would require exponential time. Thus, we use the Ant Colony Optimization (ACO) as a stochastic wrapper. It has been shown by Greene et al. that this approach, if expert knowledge is incorporated, is effective for analysing large amounts of genetic variation[2]. In the ACO method integrated in the MDR package, a linear and an exponential probability distribution function can be used to weigh the expert knowledge. We generate our biological expert knowledge from a network of gene-gene interactions produced by a literature mining platform, Pathway Studio. We show that the linear distribution function of expert knowledge is the most appropriate to weigh our scores when expert knowledge from literature mining is used. We find that ACO parameters significantly affect the power of the method and we suggest values for these parameters that can be used to optimize MDR in Genome Wide Association Studies that use biological expert knowledge.

[1]  Jason H. Moore,et al.  Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases , 2008, Human Genetics.

[2]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[3]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[4]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[5]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[6]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[7]  Arthur C. Sanderson,et al.  Bladder cancer SNP panel predicts susceptibility and survival , 2009, Human Genetics.

[8]  Sara Marsal,et al.  Identification of a two-loci epistatic interaction associated with susceptibility to rheumatoid arthritis through reverse engineering and multifactor dimensionality reduction. , 2007, Genomics.

[9]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[10]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[11]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[12]  Bart Baesens,et al.  Editorial survey: swarm intelligence for data mining , 2010, Machine Learning.

[13]  M. Dorigo,et al.  1 Positive Feedback as a Search Strategy , 1991 .

[14]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[15]  Jason H. Moore,et al.  Optimal Use of Expert Knowledge in Ant Colony Optimization for the Analysis of Epistasis in Human Disease , 2009, EvoBIO.

[16]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[17]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[18]  Jason H. Moore,et al.  Renin-angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene-gene interaction. , 2007, Atherosclerosis.

[19]  Robert R. Sokal,et al.  The Principles and Practice of Statistics in Biological Research. , 1982 .