The application of michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies

Genetic epidemiologists, tasked with the disentanglement of genotype-to-phenotype mappings, continue to struggle with a variety of phenomena which obscure the underlying etiologies of common complex diseases. For genetic association studies, genetic heterogeneity (GH) and epistasis (gene-gene interactions) epitomize well recognized phenomenon which represent a difficult, but accessible challenge for computational biologists. While progress has been made addressing epistasis, methods for dealing with GH tend to "side-step" the problem, limited by a dependence on potentially arbitrary cutoffs/covariates, and a loss in power synonymous with data stratification. In the present study, we explore an alternative strategy (Learning Classifier Systems (LCSs)) as a direct approach for the characterization, and modeling of disease in the presence of both GH and epistasis. This evaluation involves (1) implementing standardized versions of existing Michigan-Style LCSs (XCS, MCS, and UCS), (2) examining major run parameters, and (3) performing quantitative and qualitative evaluations across a spectrum of simulated datasets. The results of this study highlight the strengths and weaknesses of the Michigan LCS architectures examined, providing proof of principle for the application of LCSs to the GH/epistasis problem, and laying the foundation for the development of an LCS algorithm specifically designed to address GH.

[1]  John H. Holmes,et al.  Learning Classifier Systems Applied to Knowledge Discovery in Clinical Research Databases , 1999, Learning Classifier Systems.

[2]  John H. Holmes,et al.  A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data. , 1996 .

[3]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[4]  R. Straub,et al.  A potential vulnerability locus for schizophrenia on chromosome 6p24–22: evidence for genetic heterogeneity , 1995, Nature Genetics.

[5]  K. Davis,et al.  Evidence for a susceptibility gene for autism on chromosome 2 and for genetic heterogeneity. , 2001, American journal of human genetics.

[6]  Ester Bernadó-Mansilla,et al.  Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks , 2003, Evolutionary Computation.

[7]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[8]  Xavier Llorà,et al.  Knowledge-independent data mining with fine-grained parallel evolutionary algorithms , 2001 .

[9]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[10]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[11]  M. Ritchie,et al.  Exploring the Performance of Multifactor Dimensionality Reduction in Large Scale SNP Studies and in the Presence of Genetic Heterogeneity among Epistatic Disease Models , 2008, Human Heredity.

[12]  Martin V. Butz,et al.  Tournament Selection: Stable Fitness Pressure in XCS , 2003, GECCO.

[13]  C. Bell DSM-IV: Diagnostic and Statistical Manual of Mental Disorders , 1994 .

[14]  Kenneth A. De Jong,et al.  Learning Concept Classification Rules Using Genetic Algorithms , 1991, IJCAI.

[15]  John H. Holmes Discovering Risk of Disease with a Learning Classifier System , 1997, ICGA.

[16]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[17]  John H. Holmes,et al.  Rule Discovery in Epidemiologic Surveillance Data Using EpiXCS: An Evolutionary Computation Approach , 2005, AIME.

[18]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[19]  Larry Bull,et al.  A Simple Payoff-Based Learning Classifier System , 2004, PPSN.

[20]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .

[21]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[22]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[23]  Yang Gao,et al.  LCSE: Learning Classifier System Ensemble for Incremental Medical Instances , 2005, IWLCS.

[24]  Jaume Bacardit,et al.  Prediction of topological contacts in proteins using learning classifier systems , 2008, Soft Comput..

[25]  Xavier Llorà,et al.  Automated alphabet reduction method with evolutionary algorithms for protein structure prediction , 2007, GECCO '07.

[26]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[27]  Clare Bates Congdon,et al.  A comparison of genetic algorithms and other machine learning systems on a complex classification task from common disease research , 1995 .

[28]  Todd L Edwards,et al.  Genetic heterogeneity is not as threatening as you might think , 2007, Genetic epidemiology.

[29]  Thomas Lengauer,et al.  Computational epigenetics , 2008, Bioinform..

[30]  Bridget M Kuehn NIH initiatives to probe contribution of genes, environment in disease. , 2006, JAMA.

[31]  Olgierd Unold,et al.  Mining knowledge from data using Anticipatory Classifier System , 2008, Knowl. Based Syst..

[32]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[33]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[34]  Xavier Llorà,et al.  XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining , 2001, IWLCS.

[35]  C. K. Mohan,et al.  ClaDia: a fuzzy classifier system for disease diagnosis , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[36]  Martin V. Butz,et al.  Data Mining in Learning Classifier Systems: Comparing XCS with GAssist , 2005, IWLCS.

[37]  E. Boerwinkle,et al.  Strategies for elucidating the phenotypic and genetic heterogeneity of a chronic disease with a complex etiology. , 1985, Progress in clinical and biological research.

[38]  Larry Bull,et al.  Mining breast cancer data with XCS , 2007, GECCO '07.

[39]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[40]  Jason H. Moore,et al.  Learning classifier systems: a complete introduction, review, and roadmap , 2009 .

[41]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.