Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium.

As large-scale sequencing efforts turn from single genome sequencing to polymorphism discovery, single nucleotide polymorphisms (SNPs) are becoming an increasingly important class of population genetic data. But because of the ascertainment biases introduced by many methods of SNP discovery, most SNP data cannot be analyzed using classical population genetic methods. Statistical methods must instead be developed that can explicitly take into account each method of SNP discovery. Here we review some of the current methods for analyzing SNPs and derive sampling distributions for single SNPs and pairs of SNPs for some common SNP discovery schemes. We also show that the ascertainment scheme has a large effect on the estimation of linkage disequilibrium and recombination, and describe some methods of correcting for ascertainment biases when estimating recombination rates from SNP data.

[1]  P. Donnelly,et al.  Estimating recombination rates from population genetic data. , 2001, Genetics.

[2]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[3]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[4]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[5]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[6]  W. Ewens,et al.  Estimation of genetic variation at the DNA level from restriction endonuclease data. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Griffiths,et al.  Archaic African and Asian lineages in the genetic ancestry of modern humans. , 1997, American journal of human genetics.

[8]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[9]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[10]  S. Tavaré,et al.  Ancestral Inference in Population Genetics , 1994 .

[11]  R. Hudson Two-locus sampling distributions and their application. , 2001, Genetics.

[12]  S. Tavaré,et al.  The age of a mutation in a general coalescent tree , 1998 .

[13]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[14]  C. Simulating Probability Distributions in the Coalescent * , 2022 .

[15]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[16]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[17]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[18]  G. B. Golding The sampling distribution of linkage disequilibrium. , 1984, Genetics.

[19]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[20]  E. Boerwinkle,et al.  Recombinational and mutational hotspots within the human lipoprotein lipase gene. , 2000, American journal of human genetics.

[21]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[22]  S. Liu-Cordero,et al.  The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. , 2001, American journal of human genetics.

[23]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[24]  L. Jin,et al.  Worldwide Dna Sequence Variation in a 10-kilobase Noncoding Region on Human Chromosome 22 Materials and Methods Dna Samples. Sixty-four Individuals Were Collected Worldwide from 16 Populations in Four Major Geographic Areas, including 20 , 2022 .

[25]  Jon A Yamato,et al.  Usefulness of single nucleotide polymorphism data for estimating population parameters. , 2000, Genetics.

[26]  L Kruglyak,et al.  Lower-than-expected linkage disequilibrium between tightly linked markers in humans suggests a role for gene conversion. , 2001, American journal of human genetics.

[27]  C. J-F,et al.  THE COALESCENT , 1980 .