A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection

BackgroundThe statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model ‘architecture’ on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models.ResultsIn this study we utilized a geometric approach to classify pure, strict, two-locus epistatic models by “shape”. In total, 33 unique shape symmetry classes were identified. Using a detection difficulty metric, we found that model shape was consistently a significant predictor of model detection difficulty. Additionally, after categorizing shape classes by the number of edges in their shape projections, we found that this edge number was also significantly predictive of detection difficulty. Analysis of constraints within GAMETES indicated that increasing model population size can expand model class coverage but does little to change the range of observed difficulty metric scores. A variable population prevalence significantly increased the range of observed difficulty metric scores and, for certain constraints, also improved model class coverage.ConclusionsThese analyses further our theoretical understanding of epistatic relationships and uncover guidelines for the effective generation of complex models using GAMETES. Specifically, (1) we have characterized 33 shape classes by edge number, detection difficulty, and observed frequency (2) our results support the claim that model architecture directly influences detection difficulty, and (3) we found that GAMETES will generate a maximally diverse set of models with a variable population prevalence and a larger model population size. However, a model population size as small as 1,000 is likely to be sufficient.

[1]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[2]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[3]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[4]  Jason H. Moore,et al.  Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection , 2012, BioData Mining.

[5]  Jason H. Moore,et al.  Routine discovery of complex genetic models using genetic algorithms , 2004, Appl. Soft Comput..

[6]  David M. Reif,et al.  Machine Learning for Detecting Gene-Gene Interactions , 2006, Applied bioinformatics.

[7]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[8]  J. Rice,et al.  Two‐Locus models of disease , 1992, Genetic epidemiology.

[9]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[10]  W. Bateson Mendel's Principles of Heredity , 1910, Nature.

[11]  Jörg Rambau,et al.  TOPCOM: Triangulations of Point Configurations and Oriented Matroids , 2002 .

[12]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[13]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[14]  Jason H. Moore,et al.  Application Of Genetic Algorithms To The Discovery Of Complex Models For Simulation Studies In Human Genetics , 2002, GECCO.

[15]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[16]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[17]  J. Cheverud,et al.  Epistasis and its contribution to genetic variance components. , 1995, Genetics.

[18]  P. Phillips The language of gene interaction. , 1998, Genetics.

[19]  Debbie S. Yuster,et al.  A complete classification of epistatic two-locus models , 2006, BMC Genetics.

[20]  Jason H. Moore,et al.  A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships , 2010, EvoBIO.

[21]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[22]  A. M. Cohen,et al.  Mathematical Software: Proceedings of the First International Congress of Mathematical Software Beijing, China 17-19 August 2002 , 2002 .

[23]  M. Wade,et al.  Alternative definitions of epistasis: dependence and interaction , 2001 .

[24]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[25]  Hemant K Tiwari,et al.  Problems with Genome-Wide Association Studies , 2007, Science.

[26]  L. Pachter,et al.  EPISTASIS AND SHAPES OF FITNESS LANDSCAPES , 2006, q-bio/0603034.

[27]  Georgios A. Pavlopoulos,et al.  Caipirini: using gene sets to rank literature , 2012, BioData Mining.

[28]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[29]  N. Schork,et al.  Who's afraid of epistasis? , 1996, Nature Genetics.

[30]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[31]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.