Solving Complex Problems in Human Genetics Using Genetic Programming: The Importance of Theorist-Practitionercomputer Interaction

Genetic programming (GP) shows great promise for solving complex problems in human genetics. Unfortunately, many of these methods are not accessible to biologists. This is partly due to the complexity of the algorithms that limit their ready adoption and integration into an analysis or modeling paradigm that might otherwise only use univariate statistical methods. This is also partly due to the lack of user-friendly, open-source, platform-independent, and freelyavailable software packages that are designed to be used by biologists for routine analysis. It is our objective to develop, distribute and support a comprehensive software package that puts powerful GP methods for genetic analysis in the hands of geneticists. It is our working hypothesis that the most effective use of such a software package would result from interactive analysis by both a biologist and a computer scientist (i.e. human-human-computer interaction). We present here the design and implementation of an open-source software package called Symbolic Modeler (SyMod) that seeks to facilitate geneticist-bioinformaticistcomputer interactions for problem solving in human genetics. We present and discuss the results of an application of SyMod to real data and discuss the challenges associated with delivering a user-friendly GP-based software package to the genetics community.

[1]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[2]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[3]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex adaptive systems.

[4]  Rick L. Riolo,et al.  Genetic Programming Theory and Practice XIX , 2008, Genetic and Evolutionary Computation.

[5]  Yu Shyr,et al.  Proteomic-based prognosis of brain tumor patients using direct-tissue matrix-assisted laser desorption ionization mass spectrometry. , 2005, Cancer research.

[6]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[7]  Frank Westerhoff,et al.  Modeling Exchange Rate Behavior with a Genetic Algorithm , 2003 .

[8]  D. Goldberg,et al.  Probabilistic Model Building and Competent Genetic Programming , 2003 .

[9]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[10]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[11]  Michael Caplan,et al.  Lessons Learned Using Genetic Programming in a Stock Picking Context , 2005 .

[12]  Lothar Thiele,et al.  Multiobjective genetic programming: reducing bloat using SPEA2 , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[16]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[17]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[18]  Ivan Bratko,et al.  Attribute Interactions in Medical Data Analysis , 2003, AIME.

[19]  Jin Li,et al.  Improving Technical Analysis Predictions: An Application of Genetic Programming , 1999, FLAIRS.

[20]  Terence Soule,et al.  Genetic Programming: Theory and Practice , 2003 .

[21]  Ying L. Becker,et al.  Stock Selection - an Innovative Application of Genetic Programming Methodology , 2006 .

[22]  Jason H. Moore,et al.  Symbolic Discriminant Analysis for Mining Gene Expression Patterns , 2001, ECML.

[23]  Jason H. Moore,et al.  Complex Function Sets Improve Symbolic Discriminant Analysis of Microarray Data , 2003, GECCO.

[24]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[25]  J. R. Koza,et al.  Darwinian invention and problem solving by means of genetic programming , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[26]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[27]  Daniel E. Goldberg The design of innovation: Lessons from genetic algorithms , 1998 .

[28]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[29]  Jason H. Moore,et al.  Evolutionary Computation in Microarray Data Analysis , 2002 .

[30]  Jason H. Moore,et al.  A statistical comparison of grammatical evolution strategies in the domain of human genetics , 2005, 2005 IEEE Congress on Evolutionary Computation.

[31]  Jason H. Moore,et al.  Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction: Feature Selection and Construction in the Domain of Human Genetics , 2009 .

[32]  Jason H. Moore,et al.  Cross Validation Consistency for the Assessment of Genetic Programming Results in Microarray Studies , 2003, EvoWorkshops.

[33]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[34]  Franklin Allen,et al.  Using genetic algorithms to find technical trading rules , 1999 .

[35]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[36]  David M. Reif,et al.  Integrated analysis of genetic, genomic and proteomic data , 2004, Expert review of proteomics.

[37]  Christopher J. Neely,et al.  Is Technical Analysis in the Foreign Exchange Market Profitable? A Genetic Programming Approach , 1996, Journal of Financial and Quantitative Analysis.

[38]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[39]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[40]  W. B. Langdon,et al.  Genetic Programming and Data Structures , 1998, The Springer International Series in Engineering and Computer Science.

[41]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[42]  J J Rowland,et al.  Model selection methodology in supervised learning with evolutionary computation. , 2003, Bio Systems.