Control of confounding of genetic associations in stratified populations.

To control for hidden population stratification in genetic-association studies, statistical methods that use marker genotype data to infer population structure have been proposed as a possible alternative to family-based designs. In principle, it is possible to infer population structure from associations between marker loci and from associations of markers with the trait, even when no information about the demographic background of the population is available. In a model in which the total population is formed by admixture between two or more subpopulations, confounding can be estimated and controlled. Current implementations of this approach have limitations, the most serious of which is that they do not allow for uncertainty in estimations of individual admixture proportions or for lack of identifiability of subpopulations in the model. We describe methods that overcome these limitations by a combination of Bayesian and classical approaches, and we demonstrate the methods by using data from three admixed populations--African American, African Caribbean, and Hispanic American--in which there is extreme confounding of trait-genotype associations because the trait under study (skin pigmentation) varies with admixture proportions. In these data sets, as many as one-third of marker loci show crude associations with the trait. Control for confounding by population stratification eliminates these associations, except at loci that are linked to candidate genes for the trait. With only 32 markers informative for ancestry, the efficiency of the analysis is 70%. These methods can deal with both confounding and selection bias in genetic-association studies, making family-based designs unnecessary.

[1]  David B. Goldstein,et al.  Population genetic structure of variable drug response , 2001, Nature Genetics.

[2]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[3]  G. Thomson Mapping disease genes: family-based association studies. , 1995, American journal of human genetics.

[4]  J. Carpenter,et al.  Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African‐American populations , 2000, Annals of human genetics.

[5]  John Kwagyan,et al.  CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? , 2002, Human Genetics.

[6]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[7]  Li Jin,et al.  Skin pigmentation, biogeographical ancestry and admixture mapping , 2003, Human Genetics.

[8]  P. McKeigue,et al.  Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. , 1997, American journal of human genetics.

[9]  R. Elston,et al.  The estimation of admixture in racial hybrids , 1971, Annals of human genetics.

[10]  D. Allison,et al.  Estimating African American admixture proportions by use of population-specific alleles. , 1998, American journal of human genetics.

[11]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[12]  N E Morton,et al.  Tests and estimates of allelic association in complex inheritance. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Carpenter,et al.  Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. , 2000, Annals of human genetics.

[14]  D. Clayton,et al.  A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. , 1999, American journal of human genetics.

[15]  A. Fischer,et al.  Griscelli disease maps to chromosome 15q21 and is associated with mutations in the Myosin-Va gene , 1997, Nature Genetics.

[16]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[17]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[18]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[19]  P. McKeigue Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. , 2000, American journal of human genetics.

[20]  Dmitri V. Zaykin,et al.  Effectiveness of computational methods in haplotype prediction , 2002, Human Genetics.

[21]  R. Chakraborty,et al.  Estimation of race admixture--a new method. , 1975, American journal of physical anthropology.

[22]  Daniel J Schaid,et al.  Relative efficiency of ambiguous vs. directly measured haplotype frequencies , 2002, Genetic epidemiology.

[23]  N. Schork,et al.  The future of genetic case-control studies. , 2001, Advances in genetics.

[24]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[25]  G A Satten,et al.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. , 2001, American journal of human genetics.

[26]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[27]  R C Elston,et al.  Transmission/disequilibrium tests for quantitative traits , 2001, Genetic epidemiology.

[28]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[29]  John A. Todd,et al.  Parameters for reliable results in genetic association studies in common disease , 2002, Nature Genetics.

[30]  R. Kittles,et al.  Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. , 2001, American journal of physical anthropology.

[31]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[32]  W. Ewens,et al.  A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. , 1998, American journal of human genetics.

[33]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[34]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[35]  M. Shriver,et al.  Melting curve analysis of SNPs (McSNP): a gel-free and inexpensive approach for SNP genotyping. , 2001, BioTechniques.

[36]  Katherine M Kirk,et al.  The impact of genotyping error on haplotype reconstruction and frequency estimation , 2002, European Journal of Human Genetics.

[37]  P. McKeigue,et al.  Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. , 1998, American journal of human genetics.

[38]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[39]  P. McKeigue,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. Problems of Reporting Genetic Associations with Complex Outcomes , 2022 .