Logistic Bayesian LASSO for Genetic Association Analysis of Data from Complex Sampling Designs

Detecting gene–environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case–control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case–control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.

[1]  L C Kwee,et al.  Simple methods for assessing haplotype‐environment interactions in case‐only and case‐control studies , 2007, Genetic epidemiology.

[2]  Shili Lin,et al.  Detecting Rare Haplotype‐Environment Interaction With Logistic Bayesian LASSO , 2014, Genetic epidemiology.

[3]  Alastair Scott,et al.  Case–control studies with complex sampling , 2001 .

[4]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[5]  Yuan Zhang,et al.  An Improved Version of Logistic Bayesian LASSO for Detecting Rare Haplotype-Environment Interactions with Application to Lung Cancer , 2015, Cancer informatics.

[6]  N. Laird,et al.  Estimation and Tests of Haplotype-Environment Interaction when Linkage Phase Is Ambiguous , 2003, Human Heredity.

[7]  Meng Wang,et al.  Detecting associations of rare variants with common diseases: collapsing or haplotyping? , 2015, Briefings Bioinform..

[8]  Shili Lin,et al.  Detecting rare and common haplotype-environment interaction under uncertainty of gene-environment independence assumption. , 2017, Biometrics.

[9]  Bhramar Mukherjee,et al.  Exploiting Gene‐Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes‐Type Shrinkage Estimator to Trade‐Off between Bias and Efficiency , 2008, Biometrics.

[10]  N. Rothman,et al.  The association between chronic renal failure and renal cell carcinoma may differ between black and white Americans , 2012, Cancer Causes & Control.

[11]  N. Rothman,et al.  An investigation of risk factors for renal cell carcinoma by histologic subtype in two case‐control studies , 2013, International journal of cancer.

[12]  Sowmya R. Rao,et al.  Sampling Racially Matched Population Controls for Case-Control Studies: Using DMV Lists and Oversampling Minorities , 2003 .

[13]  Jinko Graham,et al.  hapassoc: Software for Likelihood Inference of Trait Associations with SNP Haplotypes and Other Attributes , 2006 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  E. Korn,et al.  Analysis of Health Surveys: Korn/Analysis , 1999 .

[16]  Shili Lin,et al.  Detecting longitudinal effects of haplotypes and smoking on hypertension using B-splines and Bayesian LASSO , 2014, BMC Proceedings.

[17]  Yuan Zhang,et al.  Association of rare haplotypes on ULK4 and MAP4 genes with hypertension , 2016, BMC Proceedings.

[18]  C. Junien,et al.  Candidate genetic modifiers of individual susceptibility to renal cell carcinoma: a study of polymorphic human xenobiotic-metabolizing enzymes. , 1999, Cancer research.

[19]  R. Hayes,et al.  Impact of misclassification in genotype-exposure interaction studies: example of N-acetyltransferase 2 (NAT2), smoking, and bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[20]  Swati Biswas,et al.  Comparison of haplotype-based statistical tests for disease association with rare and common variants , 2016, Briefings Bioinform..

[21]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[22]  V. Janout,et al.  Epidemiology and risk factors of kidney cancer. , 2004, Biomedical papers of the Medical Faculty of the University Palacky, Olomouc, Czechoslovakia.

[23]  Meng Wang,et al.  FamLBL: detecting rare haplotype disease association based on common SNPs using case-parent triads , 2014, Bioinform..

[24]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[25]  Edward L. Korn,et al.  Analysis of Health Surveys , 1999 .

[26]  Paolo Vineis,et al.  Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3 , 2011, Nature Genetics.

[27]  Ralph DiGaetano,et al.  Hypertension and Risk of Renal Cell Carcinoma Among White and Black Americans , 2011, Epidemiology.

[28]  N. Rothman,et al.  Apolipoprotein E/C1 locus variants modify renal cell carcinoma risk. , 2009, Cancer research.

[29]  A. Ziogas,et al.  Gene-environment interactions in renal cell carcinoma. , 2001, American journal of epidemiology.

[30]  Shili Lin,et al.  Logistic Bayesian LASSO for Identifying Association with Rare Haplotypes and Application to Age‐Related Macular Degeneration , 2012, Biometrics.

[31]  Charalampos Papachristou,et al.  Evaluation of logistic Bayesian LASSO for identifying association with rare haplotypes , 2014, BMC Proceedings.

[32]  Samiran Sinha,et al.  Semiparametric Bayesian Analysis of Case–Control Data under Conditional Gene‐Environment Independence , 2007, Biometrics.

[33]  B. Weir Genetic Data Analysis II. , 1997 .

[34]  Pseudo semiparametric maximum likelihood estimation exploiting gene environment independence for population-based case-control studies with complex samples. , 2012, Biostatistics.