An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems

Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data.

[1]  Larry Bull,et al.  New approach for extracting knowledge from the XCS learning classifier system , 2007, Int. J. Hybrid Intell. Syst..

[2]  Larry Bull,et al.  Mining breast cancer data with XCS , 2007, GECCO '07.

[3]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[4]  Ester Bernadó-Mansilla,et al.  Revisiting UCS: Description, Fitness Sharing, and Comparison with XCS , 2008, IWLCS.

[5]  Jason H. Moore,et al.  Learning classifier systems: a complete introduction, review, and roadmap , 2009 .

[6]  Larry Bull,et al.  Knowledge Discovery from Medical Data: An Empirical Study with XCS , 2008, Learning Classifier Systems in Data Mining.

[7]  Yang Gao,et al.  Learning classifier system ensemble and compact rule set , 2007, Connect. Sci..

[8]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[9]  Jason H. Moore,et al.  Human Microbiome Visualization Using 3d Technology , 2011, Pacific Symposium on Biocomputing.

[10]  Larry Bull,et al.  Self-adaptive constructivism in Neural XCS and XCSF , 2008, GECCO '08.

[11]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[12]  Robert E. Smith,et al.  MILCS in protein structure prediction with default hierarchies , 2009, GEC '09.

[13]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[14]  Jason H. Moore,et al.  The application of michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies , 2010, GECCO '10.

[15]  T. Kovacs XCS Classifier System Reliably Evolves Accurate, Complete, and Minimal Representations for Boolean Functions , 1998 .

[16]  Yinghuan Shi,et al.  Clustering with XCS and Agglomerative Rule Merging , 2009, IDEAL.

[17]  Pier Luca Lanzi,et al.  Mining interesting knowledge from data with the XCS classifier system , 2001 .

[18]  Jaume Bacardit,et al.  Bloat Control and Generalization Pressure Using the Minimum Description Length Principle for a Pittsburgh Approach Learning Classifier System , 2005, IWLCS.

[19]  Martin J. Oates,et al.  A Ruleset Reduction Algorithm for the XCS Learning Classifier System , 2002, IWLCS.

[20]  Hemant K Tiwari,et al.  Problems with Genome-Wide Association Studies , 2007, Science.

[21]  Chunsheng Fu,et al.  A Modified Classifier System Compaction Algorithm , 2002, GECCO.

[22]  Ester Bernadó-Mansilla,et al.  Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks , 2003, Evolutionary Computation.

[23]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[24]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[25]  Stewart W. Wilson Compact Rulesets from XCSI , 2001, IWLCS.

[26]  Martin V. Butz,et al.  Function Approximation With XCS: Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction , 2008, IEEE Transactions on Evolutionary Computation.

[27]  Edmund K. Burke,et al.  Improving the scalability of rule-based evolutionary learning , 2009, Memetic Comput..

[28]  Jason H. Moore,et al.  The Application of Pittsburgh-Style Learning Classifier Systems to Address Genetic Heterogeneity and Epistasis in Association Studies , 2010, PPSN.

[29]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[30]  Christin Seifert,et al.  A Novel Visualization Approach for Data-Mining-Related Classification , 2009, 2009 13th International Conference Information Visualisation.

[31]  Martin V. Butz,et al.  Knowledge Extraction and Problem Structure Identification in XCS , 2004, PPSN.

[32]  Xavier Llorà,et al.  Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging , 2007, GECCO '07.