A novel approach to simulate gene-environment interactions in complex diseases

BackgroundComplex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones.ResultsWe present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful.ConclusionsBy the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.

[1]  P. Donnelly,et al.  New models of collaboration in genome-wide association studies: the Genetic Association Information Network , 2007, Nature Genetics.

[2]  Chun Li,et al.  GWAsimulator: a rapid whole-genome simulation program , 2007, Bioinform..

[3]  M Speer,et al.  Chromosome‐based method for rapid computer simulation in human genetic linkage analysis , 1993, Genetic epidemiology.

[4]  J. Haldane The interaction of nature and nurture. , 1946, Annals of eugenics.

[5]  D E Weeks,et al.  Polygenic disease: methods for mapping complex disease traits. , 1995, Trends in genetics : TIG.

[6]  J. Ott,et al.  Mathematical multi-locus approaches to localizing complex human trait genes , 2003, Nature Reviews Genetics.

[7]  M. Boehnke,et al.  Estimating the power of a proposed linkage study: a practical computer simulation approach. , 1986, American journal of human genetics.

[8]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[9]  May,et al.  [Wiley Series in Probability and Statistics] Applied Survival Analysis (Regression Modeling of Time-to-Event Data) || Extensions of the Proportional Hazards Model , 2008 .

[10]  Yoshiji Yamada,et al.  Genetic risk and gene-environment interaction in coronary artery spasm in Japanese men and women. , 2004, European heart journal.

[11]  Antonio Carvajal-Rodríguez,et al.  Simulation of Genomes: A Review , 2008, Current genomics.

[12]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[13]  M. Boehnke,et al.  Estimating the power of a proposed linkage study for a complex genetic trait. , 1989, American journal of human genetics.

[14]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[15]  Marek Kimmel,et al.  Forward-Time Simulations of Human Populations with Complex Diseases , 2007, PLoS genetics.

[16]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[17]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[18]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[19]  J H Lubin,et al.  Power and sample size calculations in case-control studies of gene-environment interactions: comments on different approaches. , 1999, American journal of epidemiology.

[20]  S. Leal,et al.  SimPed: A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures , 2005, Human Heredity.

[21]  W James Gauderman,et al.  Sample size requirements for matched case‐control studies of gene–environment interaction , 2002, Statistics in medicine.

[22]  Mikko J Sillanpää,et al.  Backward simulation of ancestors of sampled individuals. , 2005, Theoretical population biology.

[23]  Xin Xu,et al.  Implementing a unified approach to family‐based tests of association , 2000, Genetic epidemiology.

[24]  Mike Schmidt,et al.  Statistical Applications in Genetics and Molecular Biology Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns : Environmental Covariates , Gene-Gene and Gene-Environment Interaction , 2011 .

[25]  Muin J Khoury,et al.  Do we need genomic research for the prevention of common diseases with environmental causes? , 2005, American journal of epidemiology.

[26]  J. Ott Computer-simulation methods in human linkage analysis. , 1989, Proceedings of the National Academy of Sciences of the United States of America.