Secondary phenotype analysis in ascertained family designs: application to the Leiden longevity study

The case‐control design is often used to test associations between the case‐control status and genetic variants. In addition to this primary phenotype, a number of additional traits, known as secondary phenotypes, are routinely recorded, and typically, associations between genetic factors and these secondary traits are studied too. Analysing secondary phenotypes in case‐control studies may lead to biased genetic effect estimates, especially when the marker tested is associated with the primary phenotype and when the primary and secondary phenotypes tested are correlated. Several methods have been proposed in the literature to overcome the problem, but they are limited to case‐control studies and not directly applicable to more complex designs, such as the multiple‐cases family studies. A proper secondary phenotype analysis, in this case, is complicated by the within families correlations on top of the biased sampling design. We propose a novel approach to accommodate the ascertainment process while explicitly modelling the familial relationships. Our approach pairs existing methods for mixed‐effects models with the retrospective likelihood framework and uses a multivariate probit model to capture the association between the mixed type primary and secondary phenotypes. To examine the efficiency and bias of the estimates, we performed simulations under several scenarios for the association between the primary phenotype, secondary phenotype and genetic markers. We will illustrate the method by analysing the association between triglyceride levels and glucose (secondary phenotypes) and genetic markers from the Leiden Longevity Study, a multiple‐cases family study that investigates longevity. © 2017 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.

[1]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[2]  P E Slagboom,et al.  Weighted statistics for aggregation and linkage analysis of human longevity in selected families: The Leiden Longevity Study , 2009, Statistics in medicine.

[3]  William S Bush,et al.  Evidence for polygenic susceptibility to multiple sclerosis--the shape of things to come. , 2010, American journal of human genetics.

[4]  M. Gail,et al.  Efficient Adaptively Weighted Analysis of Secondary Phenotypes in Case-Control Genome-Wide Association Studies , 2012, Human Heredity.

[5]  Jeanine Houwing-Duistermaat,et al.  Estimation of genetic effects in multiple cases family studies using penalized maximum likelihood methodology. , 2013, Biostatistics.

[6]  David C Christiani,et al.  Genome-wide association analysis for multiple continuous secondary phenotypes. , 2013, American journal of human genetics.

[7]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[8]  D. Zeng,et al.  Proper analysis of secondary phenotype data in case‐control association studies , 2009, Genetic epidemiology.

[9]  Matthew C Keller,et al.  Recent methods for polygenic analysis of genome-wide data implicate an important effect of common variants on cardiovascular disease risk , 2011, BMC Medical Genetics.

[10]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[11]  E. Wijsman,et al.  GIGI: an approach to effective imputation of dense genotypes on large pedigrees. , 2013, American journal of human genetics.

[12]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[13]  Anthony J. Hayter,et al.  The evaluation of general non‐centred orthant probabilities , 2003 .

[14]  Jeanine Houwing-Duistermaat,et al.  Marginal genetic effects estimation in family and twin studies using random-effects models. , 2015, Biometrics.

[15]  Hongzhe Li,et al.  A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies. , 2012, Biostatistics.

[16]  A. Skytthe,et al.  Design, recruitment, logistics, and data management of the GEHA (Genetics of Healthy Ageing) project , 2011, Experimental Gerontology.

[17]  M. Beekman,et al.  Polymorphisms associated with type 2 diabetes in familial longevity: The Leiden Longevity Study , 2010, Aging.

[18]  B. Balliu,et al.  A Retrospective Likelihood Approach for Efficient Integration of Multiple Omics Factors in Case‐Control Association Studies , 2015, Genetic epidemiology.

[19]  J. Houwing-Duistermaat,et al.  Genome‐wide linkage scan in affected sibling pairs identifies novel susceptibility region for venous thromboembolism: Genetics In Familial Thrombosis study , 2013, Journal of thrombosis and haemostasis : JTH.

[20]  T. Sellers Statistical Methods in Genetic Epidemiology , 2005 .

[21]  A. Scott,et al.  Re-using data from case-control studies. , 1997, Statistics in medicine.

[22]  P. Kraft,et al.  Genome‐wide association scans for secondary traits using case‐control samples , 2009, Genetic epidemiology.

[23]  Fei Zou,et al.  Unified Analysis of Secondary Traits in Case–Control Association Studies , 2013, Journal of the American Statistical Association.

[24]  I. Gottesman,et al.  The endophenotype concept in psychiatry: etymology and strategic intentions. , 2003, The American journal of psychiatry.

[25]  J. Klenk,et al.  Analyses of Case–Control Data for Additional Outcomes , 2007, Epidemiology.

[26]  Yi Li,et al.  A novel application of a bivariate regression model for binary and continuous outcomes to studies of fetal toxicity , 2009, Journal of the Royal Statistical Society. Series C, Applied statistics.

[27]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.