A partially linear tree‐based regression model for assessing complex joint gene–gene and gene–environment effects

The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low‐order gene–gene interactions but not for exploring complex higher‐order interactions. Tree‐based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree‐based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non‐parametric tree ‐structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal “pruned” tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population‐based case‐control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene–environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research. Genet. Epidemiol. © 2007 Wiley‐Liss, Inc.

[1]  J. H. Moore,et al.  A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT , 2006, Genetic epidemiology.

[2]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[3]  D. Conti,et al.  SNPs, haplotypes, and model selection in a candidate gene region: The SIMPle analysis for multilocus data , 2004, Genetic epidemiology.

[4]  A. S. Foulkes,et al.  Combining genotype groups and recursive partitioning: an application to human immunodeficiency virus type 1 genetics data , 2004 .

[5]  Heping Zhang,et al.  Use of classification trees for association studies , 2000, Genetic epidemiology.

[6]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[7]  Alastair Scott,et al.  Hypothesis testing in case-control studies , 1989 .

[8]  J. Longmate,et al.  Complexity and power in case-control association studies. , 2001, American journal of human genetics.

[9]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[10]  Bingshu E. Chen,et al.  Prostaglandin-endoperoxide synthase 2 (PTGS2) gene polymorphisms and risk of biliary tract cancer and gallstones: a population-based study in Shanghai, China. , 2006, Carcinogenesis.

[11]  M. Province,et al.  Using Tree‐Based Recursive Partitioning Methods to Group Haplotypes for Increased Power in Association Studies , 2005, Annals of human genetics.

[12]  M. Reilly,et al.  MDR and PRP: A Comparison of Methods for High-Order Genotype-Phenotype Associations , 2005, Human Heredity.

[13]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[14]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[15]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[16]  F. McMahon,et al.  A tree‐based model for allele‐sharing‐based linkage analysis in human complex diseases , 2006, Genetic epidemiology.

[17]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[18]  D C Rao,et al.  CAT scans, PET scans, and genomic scans , 1998, Genetic epidemiology.

[19]  E R Martin,et al.  Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. , 2005, American journal of human genetics.

[20]  David V Conti,et al.  A testing framework for identifying susceptibility genes in the presence of epistasis. , 2006, American journal of human genetics.

[21]  Ingo Ruczinski,et al.  Identifying interacting SNPs using Monte Carlo logic regression , 2005, Genetic epidemiology.

[22]  N. Cook,et al.  Tree and spline based association analysis of gene–gene interaction models for ischemic stroke , 2004, Statistics in medicine.

[23]  S. Dudoit,et al.  Resampling-based multiple testing for microarray data analysis , 2003 .

[24]  J. Fraumeni,et al.  Beta-catenin mutations in biliary tract cancers: a population-based study in China. , 2001, Cancer research.

[25]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.