Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach

Background and objective Detecting complex patterns of association between genetic or environmental risk factors and disease risk has become an important target for epidemiological research. In particular, strategies that provide multifactor interactions or heterogeneous patterns of association can offer new insights into association studies for which traditional analytic tools have had limited success. Materials and methods To concurrently examine these phenomena, previous work has successfully considered the application of learning classifier systems (LCSs), a flexible class of evolutionary algorithms that distributes learned associations over a population of rules. Subsequent work dealt with the inherent problems of knowledge discovery and interpretation within these algorithms, allowing for the characterization of heterogeneous patterns of association. Whereas these previous advancements were evaluated using complex simulation studies, this study applied these collective works to a ‘real-world’ genetic epidemiology study of bladder cancer susceptibility. Results and discussion We replicated the identification of previously characterized factors that modify bladder cancer risk—namely, single nucleotide polymorphisms from a DNA repair gene, and smoking. Furthermore, we identified potentially heterogeneous groups of subjects characterized by distinct patterns of association. Cox proportional hazard models comparing clinical outcome variables between the cases of the two largest groups yielded a significant, meaningful difference in survival time in years (survivorship). A marginally significant difference in recurrence time was also noted. These results support the hypothesis that an LCS approach can offer greater insight into complex patterns of association. Conclusions This methodology appears to be well suited to the dissection of disease heterogeneity, a key component in the advancement of personalized medicine.

[1]  Casey S. Greene,et al.  Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture , 2009, PloS one.

[2]  Silke Schmidt,et al.  Ordered subset linkage analysis supports a susceptibility locus for age-related macular degeneration on chromosome 16p12 , 2004, BMC Genetics.

[3]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[4]  Aaron Kershenbaum,et al.  Clique-Finding for Heterogeneity and Multidimensionality in Biomarker Epidemiology Research: The CHAMBER Algorithm , 2009, PloS one.

[5]  N Risch,et al.  A new statistical test for linkage heterogeneity. , 1988, American journal of human genetics.

[6]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[7]  M. LeBlanc,et al.  Logic Regression , 2003 .

[8]  Eden R Martin,et al.  Confronting complexity in late‐onset Alzheimer disease: application of two‐stage analysis approach addressing heterogeneity and epistasis , 2008, Genetic epidemiology.

[9]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[10]  M A Province,et al.  Tree‐based recursive partitioning methods for subdividing sibpairs into relatively more homogeneous subgroups , 2001, Genetic epidemiology.

[11]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.

[12]  Ester Bernadó-Mansilla,et al.  Revisiting UCS: Description, Fitness Sharing, and Comparison with XCS , 2008, IWLCS.

[13]  Jason H. Moore,et al.  Learning classifier systems: a complete introduction, review, and roadmap , 2009 .

[14]  Jason H. Moore,et al.  An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems , 2012, IEEE Computational Intelligence Magazine.

[15]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[16]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[17]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[18]  Peter Kraft,et al.  Replication in genome-wide association studies. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  G. Mendel,et al.  Mendel's Principles of Heredity , 1910, Nature.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  T. Werge,et al.  Analysis of Heterogeneity and Epistasis in Physiological Mixed Populations by Combined Structural Equation Modelling and Latent Class Analysis , 2008 .

[22]  Ester Bernadó-Mansilla,et al.  Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks , 2003, Evolutionary Computation.

[23]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[24]  J. Cleaver Common pathways for ultraviolet skin carcinogenesis in the repair and replication defective groups of xeroderma pigmentosum. , 2000, Journal of dermatological science.

[25]  D. Thomas,et al.  Toxicokinetic genetics: an approach to gene-environment and gene-gene interactions in complex metabolic pathways. , 2004, IARC scientific publications.

[26]  Marylyn D. Ritchie,et al.  Association Rule Discovery Has the Ability to Model Complex Genetic Effects , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[27]  C. Ulrich,et al.  Polymorphisms in DNA repair genes and associations with cancer risk. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[28]  Jason H. Moore,et al.  The Application of Pittsburgh-Style Learning Classifier Systems to Address Genetic Heterogeneity and Epistasis in Association Studies , 2010, PPSN.

[29]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[30]  M. Ritchie,et al.  Exploring the Performance of Multifactor Dimensionality Reduction in Large Scale SNP Studies and in the Presence of Genetic Heterogeneity among Epistatic Disease Models , 2008, Human Heredity.

[31]  Jason H. Moore,et al.  The application of michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies , 2010, GECCO '10.

[32]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[33]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[34]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[35]  J. Long The genetic structure of admixed populations. , 1991, Genetics.

[36]  D. Tregouet,et al.  Automated detection of informative combined effects in genetic association studies of complex traits. , 2003, Genome research.

[37]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[38]  A. S. Foulkes,et al.  Combining genotype groups and recursive partitioning: an application to human immunodeficiency virus type 1 genetics data , 2004 .

[39]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[40]  Jason H. Moore,et al.  Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems , 2012, GECCO '12.

[41]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[42]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[43]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[44]  M A Pericak-Vance,et al.  Fine mapping of autistic disorder to chromosome 15q11‐q13 by use of phenotypic subtypes. , 2003, American journal of human genetics.

[45]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[46]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[47]  C. Harris,et al.  Genetic polymorphisms in DNA repair genes and risk of lung cancer. , 2001, Carcinogenesis.

[48]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[49]  Jason H. Moore,et al.  Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data , 2006, BMC Bioinformatics.

[50]  C. A. Smith,et al.  Testing for heterogeneity of recombination fraction values in Human Genetics , 1963, Annals of human genetics.

[51]  T. Flatt The Evolutionary Genetics of Canalization , 2005, The Quarterly Review of Biology.