Comparison of haplotype-based tests for detecting gene-environment interactions with rare variants

Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene-environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene-environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype-smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.

[1]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[2]  N. Laird,et al.  Estimation and Tests of Haplotype-Environment Interaction when Linkage Phase Is Ambiguous , 2003, Human Heredity.

[3]  Jun Yokota,et al.  Genetic basis for susceptibility to lung cancer: Recent progress and future directions. , 2010, Advances in cancer research.

[4]  Nengjun Yi,et al.  A Bayesian Hierarchical Model for Detecting Haplotype-Haplotype and Haplotype-Environment Interactions in Genetic Association Studies , 2011, Human Heredity.

[5]  Faming Liang,et al.  A Flexible Bayesian Model for Studying Gene–Environment Interaction , 2012, PLoS genetics.

[6]  Swati Biswas,et al.  Comparison of haplotype-based statistical tests for disease association with rare and common variants , 2016, Briefings Bioinform..

[7]  William Wheeler,et al.  Multiple Independent Loci at Chromosome 15q25.1 Affect Smoking Quantity: a Meta-Analysis and Comparison with Lung Cancer and COPD , 2010, PLoS genetics.

[8]  G. Satten,et al.  Comparison of prospective and retrospective methods for haplotype inference in case‐control studies , 2004, Genetic epidemiology.

[9]  Eric J Tchetgen Tchetgen,et al.  Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. , 2012, American journal of epidemiology.

[10]  C. Kalynych,et al.  Racial Differences in Blood Lipids Lead to Underestimation of Cardiovascular Risk in Black Women in a Nested observational Study , 2013, Global advances in health and medicine.

[11]  Shili Lin,et al.  A Family-Based Rare Haplotype Association Method for Quantitative Traits , 2019, Human Heredity.

[12]  Claudia Hemmelmann,et al.  Statistical analysis of rare sequence variants: an overview of collapsing methods , 2011, Genetic epidemiology.

[13]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[14]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[15]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[16]  Raymond J Carroll,et al.  Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies , 2009, Journal of the American Statistical Association.

[17]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[18]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[19]  Jinko Graham,et al.  A Note on Inference of Trait Associations with SNP Haplotypes and Other Attributes in Generalized Linear Models , 2004, Human Heredity.

[20]  N. Kaplan,et al.  On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles , 2002, Genetic epidemiology.

[21]  Nengjun Yi,et al.  Haplotype Kernel Association Test as a Powerful Method to Identify Chromosomal Regions Harboring Uncommon Causal Variants , 2013, Genetic epidemiology.

[22]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[23]  Raymond J Carroll,et al.  Retrospective analysis of haplotype-based case control studies under a flexible model for gene environment association. , 2008, Biostatistics.

[24]  Shili Lin,et al.  Kullback–Leibler Distance Methods for Detecting Disease Association with Rare Variants from Sequencing Data , 2015, Annals of human genetics.

[25]  Daniel F. Gudbjartsson,et al.  A variant associated with nicotine dependence, lung cancer and peripheral arterial disease , 2008, Nature.

[26]  Christopher I Amos,et al.  The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. , 2008, Journal of the National Cancer Institute.

[27]  Eric Boerwinkle,et al.  Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL , 2007, Nature Genetics.

[28]  M. Reilly,et al.  Differential Association of Plasma Angiopoietin-Like Proteins 3 and 4 With Lipid and Metabolic Traits , 2014, Arteriosclerosis, thrombosis, and vascular biology.

[29]  Shili Lin,et al.  Detecting Rare Haplotype‐Environment Interaction With Logistic Bayesian LASSO , 2014, Genetic epidemiology.

[30]  D. Goldstein,et al.  Sequencing studies in human genetics: design and interpretation , 2013, Nature Reviews Genetics.

[31]  Meng Wang,et al.  Detecting associations of rare variants with common diseases: collapsing or haplotyping? , 2015, Briefings Bioinform..

[32]  Shili Lin,et al.  Logistic Bayesian LASSO for Identifying Association with Rare Haplotypes and Application to Age‐Related Macular Degeneration , 2012, Biometrics.

[33]  Eric Boerwinkle,et al.  Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. , 2008, The Journal of clinical investigation.

[34]  Ronald M Peshock,et al.  The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. , 2004, The American journal of cardiology.

[35]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[36]  Shili Lin,et al.  Detecting rare and common haplotype-environment interaction under uncertainty of gene-environment independence assumption. , 2017, Biometrics.

[37]  Wei Guo,et al.  Generalized linear modeling with regularization for detecting common disease rare haplotype association , 2009, Genetic epidemiology.

[38]  M. Szklo,et al.  Racial/ethnic differences in the association of triglycerides with other metabolic syndrome components: the Multi-Ethnic Study of Atherosclerosis. , 2011, Metabolic syndrome and related disorders.

[39]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[40]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[41]  Daniel J Schaid,et al.  Genetic epidemiology and haplotypes , 2004, Genetic epidemiology.

[42]  Yuan Zhang,et al.  Association of rare haplotypes on ULK4 and MAP4 genes with hypertension , 2016, BMC Proceedings.

[43]  Nengjun Yi,et al.  Haplotype‐Based Methods for Detecting Uncommon Causal Variants With Common SNPs , 2012, Genetic epidemiology.

[44]  Jinko Graham,et al.  hapassoc: Software for Likelihood Inference of Trait Associations with SNP Haplotypes and Other Attributes , 2006 .

[45]  Yuan Zhang,et al.  An Improved Version of Logistic Bayesian LASSO for Detecting Rare Haplotype-Environment Interactions with Application to Lung Cancer , 2015, Cancer informatics.

[46]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .