Number of SNPS Loci Needed to Detect Population Structure

The study of the association of polymorphic genetic markers with common diseases is one of the most powerful tools in modern genetics. Interest in single nucleotide polymorphisms (SNPs) has steadily grown over the last decade. SNPs are currently the most developed markers in the human genome because they have a number of advantages over other marker types. One of the critical problems responsible for ‘spurious’ association findings in case-control studies is population stratification. There are many statistical approaches developed for detecting population heterogeneity. However the power to detect population structure by known methods is highly dependent on the number of loci utilised. We performed an analysis of SNPs data available in the public domain from The Single Nucleotide Consortia Ltd. (TSCL). Three populations, Afro-American, Asian and Caucasian, were compared. Estimation of the minimum number of SNPs loci necessary for detection of the population structure was performed. Two clustering approaches, distance-based and model-based, were compared. The model-based approach was superior when compared with the distance-based method. We found more than 65 random SNPs loci are required for identifying distinct geographically separated populations. Increasing the number of markers to over 100 raises the probability of correct assignment of a particular individual to an origin group to over 90%, even with conventional clustering methods.

[1]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[2]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[3]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[4]  Mia Hubert,et al.  Integrating robust clustering techniques in S-PLUS , 1997 .

[5]  K. Lunetta,et al.  Testing for population subdivision and association in four case-control studies. , 2002, American journal of human genetics.

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  John Kwagyan,et al.  CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? , 2002, Human Genetics.

[8]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[9]  S. Sherry,et al.  Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. , 2002, Genome research.

[10]  M. Shriver,et al.  Interrogating a high-density SNP map for signatures of natural selection. , 2002, Genome research.

[11]  S. Liu-Cordero,et al.  The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. , 2001, American journal of human genetics.

[12]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[13]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[14]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[15]  R. Lewontin The Apportionment of Human Diversity , 1972 .

[16]  L. Cavalli-Sforza,et al.  Multilocus genotypes, a tree of individuals, and human evolutionary history. , 1997, American journal of human genetics.

[17]  Brian D. Ripley,et al.  The R Project in Statistical Computing , 2001 .

[18]  D. Goldstein,et al.  Human migrations and population structure: what we know and why it matters. , 2002, Annual review of genomics and human genetics.

[19]  Brian Everitt,et al.  Cluster analysis , 1974 .

[20]  G Barbujani,et al.  An apportionment of human DNA diversity. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M W Feldman,et al.  Distinctive genetic signatures in the Libyan Jews. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[23]  L. Cavalli-Sforza Genes, peoples and languages. , 1991, Scientific American.

[24]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[25]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[26]  Hua Tang,et al.  Categorization of humans in biomedical research: genes, race and disease , 2002, Genome Biology.

[27]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[28]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[29]  David B. Goldstein,et al.  Population genetic structure of variable drug response , 2001, Nature Genetics.

[30]  Michael J Bamshad,et al.  Human population genetic structure and inference of group membership. , 2003, American journal of human genetics.

[31]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.