Solving Complex Problems in Human Genetics using Nature-Inspired Algorithms Requires Strategies which Exploit Domain-Specific Knowledge

In human genetics the availability of chip-based technology facilitates the measurement of thousands of DNA sequence variations from across the human genome. The informatics challenge is to identify combinations of interacting DNA sequence variations that predict common diseases. The authors review three nature-inspired methods that have been developed and evaluated in this domain. The two approaches this chapter focuses on in detail are genetic programming (GP) and a complex-system inspired GPlike computational evolution system (CES). The authors also discuss a third nature-inspired approach known as ant colony optimization (ACO). The GP and ACO techniques are designed to select relevant attributes, while the CES addresses both the selection of relevant attributes and the modeling of disease risk. Specifically, they examine these methods in the context of epistasis or gene-gene interactions. For the work discussed here we focus solely on the situation where there is an epistatic effect but no detectable main effect. In this domain, early studies show that nature-inspired algorithms perform no better than a simple random search when classification accuracy is used as the fitness function. Thus, the challenge for applying these search algorithms to this problem is that when using classification accuracy there are no building blocks. The goal then is to use outside knowledge or pre-processing of the dataset to provide these building blocks in a manner that enables the population, in a nature-inspired framework,

[1]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[2]  Frederico G. Guimarães,et al.  Memetic and Evolutionary Design of Wireless Sensor Networks Based on Complex Network Characteristics , 2010, Int. J. Nat. Comput. Res..

[3]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[4]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[5]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[6]  Lee Spector,et al.  Genetic Programming and Autoconstructive Evolution with the Push Programming Language , 2002, Genetic Programming and Evolvable Machines.

[7]  Heinrich Theodor Vierhaus,et al.  Design and Test Technology for Dependable Systems-on-Chip , 2010 .

[8]  Basabi Chakraborty,et al.  Kansei Engineering and Soft Computing: Theory and Practice , 2010 .

[9]  Conor Ryan,et al.  Using context-aware crossover to improve the performance of GP , 2006, GECCO '06.

[10]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[11]  Jean-Philippe Rennard,et al.  Handbook of Research on Nature-inspired Computing for Economics and Management , 2006 .

[12]  Jason H. Moore,et al.  Development and Evaluation of an Open-Ended Computational Evolution System for the Genetic Analysis of Susceptibility to Common Human Diseases , 2008, EvoBIO.

[13]  Mark Burgin,et al.  On Foundations of Evolutionary Computation: An Evolutionary Automata Approach , 2009 .

[14]  J. Miller,et al.  Guidelines: From artificial evolution to computational evolution: a research agenda , 2006, Nature Reviews Genetics.

[15]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[16]  Jason H. Moore,et al.  A statistical comparison of grammatical evolution strategies in the domain of human genetics , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  Jason H. Moore,et al.  Ant Colony Optimization for Genome-Wide Genetic Analysis , 2008, ANTS Conference.

[18]  Jason H. Moore,et al.  An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[19]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[20]  Silvio Misera,et al.  Fault Simulation and Fault Injection Technology Based on SystemC , 2011 .

[21]  K. Burrage,et al.  Computational approaches for modeling intrinsic noise and delays in genetic regulatory networks , 2010 .

[22]  Conor Ryan,et al.  On the constructiveness of context-aware crossover , 2007, GECCO '07.

[23]  William B. Langdon,et al.  Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! , 1998 .

[24]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[25]  John R. Koza,et al.  Genetic Programming III: Darwinian Invention & Problem Solving , 1999 .

[26]  W. Hsu,et al.  Handbook of Research on Computational Methodologies in Gene Regulatory Networks , 2009 .

[27]  Huiwen Deng,et al.  On the Ordering Property and Law of Importation in Fuzzy Logic , 2010, Int. J. Artif. Life Res..

[28]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[29]  T. Stützle,et al.  MAX-MIN Ant System and local search for the traveling salesman problem , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[30]  Jason H. Moore,et al.  Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases , 2008, Human Genetics.

[31]  Jason H Moore,et al.  Computational analysis of gene-gene interactions using multifactor dimensionality reduction , 2004, Expert review of molecular diagnostics.

[32]  Raymond Chiong Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery: Implications in Business, Science, and Engineering , 2010, Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery.

[33]  Adel Taweel,et al.  Prediction of Non-Functional Properties of Service-Based Systems: A Software Reliability Model , 2011 .

[34]  Jason H. Moore,et al.  Tuning ReliefF for Genome-Wide Genetic Analysis , 2007, EvoBIO.

[35]  Jason H. Moore,et al.  Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction: Feature Selection and Construction in the Domain of Human Genetics , 2009 .

[36]  Alex A. Freitas,et al.  An ant colony based system for data mining: applications to medical data , 2001 .

[37]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[38]  Conor Ryan,et al.  Context-aware mutation: a modular, context aware mutation operator for genetic programming , 2007, GECCO '07.

[39]  Gianluca Miscione Shifting Legitimation along Information Infrastructures Growth: Local Social Embeddedness, Global Organizational Fields, and Full Scale Coverage1 , 2011 .

[40]  Conor Ryan,et al.  A Less Destructive, Context-Aware Crossover Operator for GP , 2006, EuroGP.

[41]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[42]  Vincenzo Manca,et al.  MP Modelling of Glucose-Insulin Interactions in the Intravenous Glucose Tolerance Test , 2011, Int. J. Nat. Comput. Res..

[43]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[44]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[45]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[46]  Hongwei Mo,et al.  Handbook of Research on Artificial Immune Systems and Natural Computing: Applying Complex Adaptive Technologies , 2008 .

[47]  Hartmut Schmeck,et al.  Ant colony optimization for resource-constrained project scheduling , 2000, IEEE Trans. Evol. Comput..

[48]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[49]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[50]  Lee Spector,et al.  Autoconstructive Evolution: Push, PushGP, and Pushpop , 2001 .

[51]  Terence Soule,et al.  Genetic Programming: Theory and Practice , 2003 .

[52]  Ben Goertzel,et al.  The China Brain Project: An Evolutionary Engineering Approach to Building China’s First Artificial Brain Consisting of 10,000s of Evolved Neural Net Minsky-Like Agents , 2011 .

[53]  Jason H. Moore,et al.  Using expert knowledge in initialization for genome-wide analysis of epistasis using genetic programming , 2008, GECCO '08.

[54]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[55]  Tom Erez,et al.  Social Anti-Percolation and Negative Word of Mouth , 2004 .

[56]  Bill C. White,et al.  Does Complexity Matter? Artificial Evolution, Computational Evolution and the Genetic Analysis of Epistasis in Common Human Diseases. , 2009 .

[57]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[58]  Michael O'Neill,et al.  Grammatical evolution - evolutionary automatic programming in an arbitrary language , 2003, Genetic programming.

[59]  B. Bullnheimer,et al.  A NEW RANK BASED VERSION OF THE ANT SYSTEM: A COMPUTATIONAL STUDY , 1997 .

[60]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[61]  Lee Spector,et al.  An Essay Concerning Human Understanding of Genetic Programming , 2003 .

[62]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[63]  Misook Heo,et al.  Girls and Computers - Yes We Can!: A Case Study on Improving Female Computer Confidence and Decreasing Gender Inequity in Computer Science with an Informal, Female Learning Community , 2011 .

[64]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .