Backward simulation of ancestors of sampled individuals.

If the population is large and the sampling mechanism is random, the coalescent is commonly used to model the haplotypes in the sample. Ordered genotypes can then be formed by random matching of the derived haplotypes. However, this approach is not realistic when (1) there is departure from random mating (e.g., dominant individuals in breeding populations or monogamy in humans), or (2) the population is small and/or the individuals in the sample are ascertained by applying some particular non-random sampling scheme, as is usually the case when considering the statistical modeling and analysis of pedigree data. For such situations, we present here a data generation method where an ancestral graph with non-overlapping generations is first generated backwards in time, using ideas from coalescent theory. Alleles are randomly assigned to the founders, and subsequently the gene flow over the entire genome is simulated forwards in time by dropping alleles down the graph according to recombination model without interference. The parameters controlling the mating behavior of generated individuals in the graph (degree of monogamy) can be tuned in order to match a particular demographic situation, without restriction to simple random mating. The performance of the approach is illustrated with a simulation example. The software (written in C-language) is freely available for research purposes at http://www.rni.helsinki.fi/~dag/.

[1]  Michael S. Blouin,et al.  DNA-based methods for pedigree reconstruction and kinship analysis in natural populations , 2003 .

[2]  K. Lange,et al.  Powerful Allele Sharing Statistics for Nonparametric Linkage Analysis , 2004, Human Heredity.

[3]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[4]  Mark Cooper,et al.  QU-GENE: a simulation platform for quantitative analysis of genetic models , 1998, Bioinform..

[5]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[6]  G. Meerman,et al.  Association and haplotype sharing due to identity by descent, with an application to genetic mapping , 1997 .

[7]  I Hoeschele,et al.  A note on algorithms for genotype and allele elimination in complex pedigrees with incomplete genotype data. , 2000, Genetics.

[8]  Martin Möhle,et al.  A Classification of Coalescent Processes for Haploid Exchangeable Population Models , 2001 .

[9]  Chung-I Wu,et al.  Precision and high-resolution mapping of quantitative trait loci by use of recurrent selection, backcross or intercross schemes. , 2002, Genetics.

[10]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[11]  A. Long,et al.  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. , 1999, Genome research.

[12]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[13]  S. Heath Generating Consistent Genotypic Configurations for Multi-Allelic Loci and Large Complex Pedigrees , 1998, Human Heredity.

[14]  Jochen Hampe,et al.  POPSIM: a general population simulation program , 1998, Bioinform..

[15]  M. Goddard,et al.  Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data , 2004, Genetics Selection Evolution.

[16]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[17]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[18]  P. Soderberg,et al.  An examination of problem-based teaching and learning in population genetics and evolution using EVOLVE, a computer simulation , 2003 .

[19]  W. James Gauderman,et al.  A method for simulating familial disease data with variable age at onset and genetic and environmental effects , 1995 .

[20]  Myung-Hoon Chung,et al.  Fractional populations in multiple gene inheritance , 2002, Bioinform..

[21]  G. T. te Meerman,et al.  Genomic sharing surrounding alleles identical by descent: Effects of genetic drift and population growth , 1997, Genetic epidemiology.

[22]  Allan E. Strand,et al.  metasim 1.0: an individual-based environment for simulating population genetics of complex population dynamics , 2002 .

[23]  M Speer,et al.  Chromosome‐based method for rapid computer simulation in human genetic linkage analysis , 1993, Genetic epidemiology.

[24]  Chris Cannings,et al.  The latent roots of certain Markov chains arising in genetics: A new approach, II. Further haploid models , 1974, Advances in Applied Probability.

[25]  E A Thompson,et al.  Linkage disequilibrium mapping: the role of population history, size, and structure. , 2001, Advances in genetics.

[26]  Oliver A. Ryder,et al.  Pedigree analysis by computer simulation , 1986 .

[27]  Mikko J Sillanpää,et al.  Bayesian analysis of multilocus association in quantitative and qualitative traits , 2003, Genetic epidemiology.

[28]  A. von Haeseler,et al.  A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms. , 2000, American journal of human genetics.

[29]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[30]  M. Goddard,et al.  Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. , 2000, Genetics.

[31]  Vesa Ollikainen,et al.  Simulation Techniques for Disease Gene Localization in Isolated Populations , 2002 .

[32]  C. J-F,et al.  THE COALESCENT , 1980 .

[33]  G. Meerman,et al.  GENETIC MAPPING OF DISEASE GENES , 1997 .

[34]  Miguel Pérez-Enciso,et al.  Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. , 2003, Genetics.

[35]  Yuqun Luo,et al.  Finding starting points for Markov chain Monte Carlo analysis of genetic data from large and complex pedigrees , 2003, Genetic epidemiology.

[36]  J. Felsenstein,et al.  Sampling among haplotype resolutions in a coalescent‐based genealogy sampler , 2000, Genetic Epidemiology.

[37]  D J Balding,et al.  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. , 2002, American journal of human genetics.

[38]  D E Weeks,et al.  Nonparametric simulation-based statistics for detecting linkage in general pedigrees. , 1996, American journal of human genetics.

[39]  Martin Möhle,et al.  Coalescent patterns in diploid exchangeable population models , 2003, Journal of mathematical biology.

[40]  E. Boerwinkle,et al.  Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. , 2001, American journal of human genetics.

[41]  N. Schork,et al.  Gene mapping via the ancestral recombination graph. , 2002, Theoretical population biology.

[42]  Wen-Hsiung Li,et al.  Coalescing into the 21st century: An overview and prospects of coalescent theory. , 1999, Theoretical population biology.

[43]  M. Soller,et al.  Advanced intercross lines, an experimental population for fine genetic mapping. , 1995, Genetics.

[44]  M. Sillanpää,et al.  Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees , 2001, Genetic epidemiology.

[45]  H Zhao,et al.  On a randomization procedure in linkage analysis. , 1999, American journal of human genetics.