Robust analysis of secondary phenotypes in case‐control genetic association studies

The case-control study is a common design for assessing the association between genetic exposures and a disease phenotype. Though association with a given (case-control) phenotype is always of primary interest, there is often considerable interest in assessing relationships between genetic exposures and other (secondary) phenotypes. However, the case-control sample represents a biased sample from the general population. As a result, if this sampling framework is not correctly taken into account, analyses estimating the effect of exposures on secondary phenotypes can be biased leading to incorrect inference. In this paper, we address this problem and propose a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype. Our approach is based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype. We show that, though slightly less efficient than a full likelihood-based analysis when the likelihood is correctly specified, it is substantially more robust to model misspecification, and can out-perform likelihood-based analysis, both in terms of validity and power, when the model is misspecified. We illustrate our approach with an application to a case-control study extracted from the Framingham Heart Study. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  A. Swislocki,et al.  The effect of body mass index on fasting blood glucose and development of diabetes mellitus after initiation of extended-release niacin. , 2010, Metabolic syndrome and related disorders.

[2]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[3]  Stijn Vansteelandt,et al.  Odds ratios for mediation analysis for a dichotomous outcome. , 2010, American journal of epidemiology.

[4]  D. Zeng,et al.  Proper analysis of secondary phenotype data in case‐control association studies , 2009, Genetic epidemiology.

[5]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[6]  J. Klenk,et al.  Analyses of Case–Control Data for Additional Outcomes , 2007, Epidemiology.

[7]  Dolores Corella,et al.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans , 2008, Nature Genetics.

[8]  David M. Evans,et al.  Genome-wide association analysis identifies 20 loci that influence adult height , 2008, Nature Genetics.

[9]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[10]  Richa Saxena,et al.  A common variant of HMGA2 is associated with adult and childhood height in the general population , 2007, Nature Genetics.

[11]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[12]  Subhajyoti De,et al.  Common variants near MC4R are associated with fat mass, weight and risk of obesity , 2008, Nature Genetics.

[13]  Raymond J Carroll,et al.  Robust estimation for homoscedastic regression in the secondary analysis of case–control data , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[15]  P. Kraft,et al.  Genome‐wide association scans for secondary traits using case‐control samples , 2009, Genetic epidemiology.

[16]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[17]  A. Swislocki,et al.  The effect of body mass index on fasting blood glucose after initiation of thiazide therapy in hypertensive patients. , 2008, American journal of hypertension.

[18]  K. Roeder,et al.  A Semiparametric Mixture Approach to Case-Control Studies with Errors in Covariables , 1996 .