Data Simulation Software for Whole-Genome Association and Other Studies in Human Genetics

Genome-wide association studies have become a reality in the study of the genetics of complex disease. This technology provides a wealth of genomic information on patient samples, from which we hope to learn novel biology and detect important genetic and environmental factors for disease processes. Because strategies for analyzing these data have not kept pace with the laboratory methods that generate the data it is unlikely that these advances will immediately lead to an improved understanding of the genetic contribution to common human disease and drug response. Currently, no single analytical method will allow us to extract all information from a whole-genome association study. Thus, many novel methods are being proposed and developed. It will be vital for the success of these new methods, to have the ability to simulate datasets consisting of polymorphisms throughout the genome with realistic linkage disequilibrium patterns. Within these datasets, we can embed genetic models of disease whereby we can evaluate the ability of novel methods to detect these simulated effects. This paper describes a new software package, genomeSIM, for the simulation of large-scale genomic data in population based case-control samples. It allows for single SNP, as well as gene-gene interaction models to be associated with disease risk. We describe the algorithm and demonstrate its utility for future genetic studies of whole-genome association.

[1]  J. Stengård,et al.  Genes, Environment, and Cardiovascular Disease , 2003, Arteriosclerosis, thrombosis, and vascular biology.

[2]  M. P. Bass,et al.  Pedigree Generation for Analysis of Genetic Linkage and Association , 2003, Pacific Symposium on Biocomputing.

[3]  M. Boehnke,et al.  Estimating the power of a proposed linkage study: a practical computer simulation approach. , 1986, American journal of human genetics.

[4]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[5]  Marek Kimmel,et al.  simuPOP: a forward-time population genetics simulation environment , 2005, Bioinform..

[6]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[7]  F. Balloux EASYPOP (version 1.7): a computer program for population genetics simulations. , 2001, The Journal of heredity.

[8]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[9]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[10]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[11]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[12]  M. Boehnke,et al.  Estimating the power of a proposed linkage study for a complex genetic trait. , 1989, American journal of human genetics.

[13]  C. J-F,et al.  THE COALESCENT , 1980 .