Methods for testing association between uncertain genotypes and quantitative traits.

Interpretability and power of genome-wide association studies can be increased by imputing unobserved genotypes, using a reference panel of individuals genotyped at higher marker density. For many markers, genotypes cannot be imputed with complete certainty, and the uncertainty needs to be taken into account when testing for association with a given phenotype. In this paper, we compare currently available methods for testing association between uncertain genotypes and quantitative traits. We show that some previously described methods offer poor control of the false-positive rate (FPR), and that satisfactory performance of these methods is obtained only by using ad hoc filtering rules or by using a harsh transformation of the trait under study. We propose new methods that are based on exact maximum likelihood estimation and use a mixture model to accommodate nonnormal trait distributions when necessary. The new methods adequately control the FPR and also have equal or better power compared to all previously described methods. We provide a fast software implementation of all the methods studied here; our new method requires computation time of less than one computer-day for a typical genome-wide scan, with 2.5 M single nucleotide polymorphisms and 5000 individuals.

[1]  Asymptotic equivalence between two score tests for haplotype‐specific risk in general linear models , 2005, Genetic epidemiology.

[2]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[3]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  C. Hoggart,et al.  Design and analysis of admixture mapping studies. , 2004, American journal of human genetics.

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[8]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[9]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[10]  Jonathan Marchini,et al.  Comparing algorithms for genotype imputation. , 2008, American journal of human genetics.

[11]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[12]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[13]  Tianhua Niu,et al.  Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms. , 2004, American journal of human genetics.

[14]  Yongtao Guan,et al.  Practical Issues in Imputation-Based Association Mapping , 2008, PLoS genetics.

[15]  Lon R Cardon,et al.  Evaluating coverage of genome-wide association studies , 2006, Nature Genetics.

[16]  D. Clayton,et al.  A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. , 1999, American journal of human genetics.

[17]  Manuel A. R. Ferreira,et al.  Practical aspects of imputation-driven meta-analysis of genome-wide association studies. , 2008, Human molecular genetics.

[18]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[19]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Lon R. Cardon,et al.  Quantifying the effects of imputation on the power, coverage and cost-efficiency of genomewide SNP platforms , 2008 .

[22]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[23]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[24]  Vincent Mooser,et al.  The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome , 2008, BMC cardiovascular disorders.

[25]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[26]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[27]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[28]  L. Wasserman,et al.  Asymptotic inference for mixture models by using data‐dependent priors , 2000 .

[29]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[30]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[31]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[32]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[33]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[34]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[35]  A. C. Collins,et al.  A method for fine mapping quantitative trait loci in outbred animal stocks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Hong-Wen Deng,et al.  Analyses and Comparison of Accuracy of Different Genotype Imputation Methods , 2008, PloS one.

[37]  D. Lin,et al.  Simple and efficient analysis of disease association with missing genotype data. , 2008, American journal of human genetics.

[38]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[39]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[40]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[41]  D. Clayton,et al.  A Method to Address Differential Bias in Genotyping in Large-Scale Association Studies , 2007, PLoS genetics.

[42]  William H. Press,et al.  Numerical recipes in C , 2002 .