Comparison of multivariate adaptive regression splines and logistic regression in detecting SNP-SNP interactions and their application in prostate cancer

AbstractSingle nucleotide polymorphism (SNP) interaction plays a critical role for complex diseases. The primary limitation of logistic regressions (LR) in testing SNP-SNP interactions is that coefficient estimates may not be valid because of numerous terms in a model. Multivariate adaptive regression splines (MARS) have useful features to effectively reduce the number of terms in a model. To study how MARS can address these drawbacks possibly better than LR, the power of MARS and LR with SNPs using the reference-coding and additive-mode scheme was compared using simulated data of ten SNPs for 400 subjects based on 1,000 replications for five interaction models. In overall scenarios, MARS performed better than LR. In the model with a dominant two-way interaction, the power range was 76-96% for MARS and 1-8% for LR in both coding schemes. In the dominant three-way interaction model, the power was 57-85% for MARS and less than 4% for LR. In the prostate cancer example, we evaluated the association between ten SNPs and prostate cancer risk in 649 Caucasians. The best model with one two-way and one three-way interaction was selected using MARS. The findings supported that MARS may provide a useful tool for exploring SNP-SNP interactions.

[1]  W. Gauderman Sample size requirements for association studies of gene-gene interaction. , 2002, American journal of epidemiology.

[2]  D. Allison,et al.  Detection of gene x gene interactions in genome-wide association studies of human population data. , 2007, Human heredity.

[3]  Runsheng Chen,et al.  Association Study With 33 Single-Nucleotide Polymorphisms in 11 Candidate Genes for Hypertension in Chinese , 2006, Hypertension.

[4]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[5]  L. Hurst Epistasis and the Evolutionary Process , 2000, Heredity.

[6]  K. Lohman,et al.  Polymorphisms of XRCC1 and XRCC3 genes and susceptibility to breast cancer. , 2003, Cancer letters.

[7]  Scott M. Williams,et al.  New strategies for identifying gene-gene interactions in hypertension , 2002, Annals of medicine.

[8]  M. Willingham,et al.  Polymorphisms in drug metabolism genes, smoking, and p53 mutations in breast cancer , 2008, Molecular carcinogenesis.

[9]  N. Cook,et al.  Tree and spline based association analysis of gene–gene interaction models for ischemic stroke , 2004, Statistics in medicine.

[10]  Lyle H. Ungar,et al.  A comparison of two nonparametric estimation schemes: MARS and neural networks , 1993 .

[11]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[12]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Mandy C. Webb,et al.  An Analysis of Quasi-complete Binary Data with Logistic Models: Applications to Alcohol Abuse Data , 2004 .

[15]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[16]  Marie-Pierre Dubé,et al.  Two-stage strategies to detect gene × gene interactions in case-control data , 2007, BMC proceedings.

[17]  K. Lohman,et al.  Deficient Nucleotide Excision Repair Capacity Enhances Human Prostate Cancer Risk , 2004, Cancer Research.

[18]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[19]  J. Friedman Multivariate adaptive regression splines , 1990 .

[20]  L J Eaves,et al.  Common Disease Analysis Using Multivariate Adaptive Regression Splines (MARS): Genetic Analysis Workshop 12 Simulated Sequence Data , 2001, Genetic epidemiology.

[21]  D. Ge,et al.  Multilocus Analyses of Renin–Angiotensin–Aldosterone System Gene Variants on Blood Pressure at Rest and During Behavioral Stress in Young Normotensive Subjects , 2007, Hypertension.

[22]  Oliver Sieber,et al.  A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21 , 2007, Nature Genetics.

[23]  Edwin J C G van den Oord,et al.  Multivariate adaptive regression splines: a powerful method for detecting disease–risk relationship differences among subgroups , 2006, Statistics in medicine.

[24]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[25]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[26]  James Strait,et al.  Genome-Wide Association Scan Shows Genetic Variants in the FTO Gene Are Associated with Obesity-Related Traits , 2007, PLoS genetics.

[27]  H. Akaike A new look at the statistical model identification , 1974 .

[28]  C. Cockerham,et al.  An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present. , 1954, Genetics.

[29]  N. Schork,et al.  The future of genetic case-control studies. , 2001, Advances in genetics.

[30]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[31]  David Curtis,et al.  Application of Logistic Regression to Case-Control Association Studies Involving Two Causative Loci , 2005, Human Heredity.

[32]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[33]  S. Soong,et al.  Variable selection in logistic regression for detecting SNP–SNP interactions: the rheumatoid arthritis example , 2008, European Journal of Human Genetics.

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  Hui-Yi Lin,et al.  Interactions of cytokine gene polymorphisms in prostate cancer risk. , 2007, Carcinogenesis.

[36]  K. Lohman,et al.  DNA-repair genetic polymorphisms and breast cancer risk. , 2003, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.