SVM‐Based Generalized Multifactor Dimensionality Reduction Approaches for Detecting Gene‐Gene Interactions in Family Studies

Gene‐gene interaction plays an important role in the etiology of complex diseases, which may exist without a genetic main effect. Most current statistical approaches, however, focus on assessing an interaction effect in the presence of the gene's main effects. It would be very helpful to develop methods that can detect not only the gene's main effects but also gene‐gene interaction effects regardless of the existence of the gene's main effects while adjusting for confounding factors. In addition, when a disease variant is rare or when the sample size is quite limited, the statistical asymptotic properties are not applicable; therefore, approaches based on a reasonable and applicable computational framework would be practical and frequently applied. In this study, we have developed an extended support vector machine (SVM) method and an SVM‐based pedigree‐based generalized multifactor dimensionality reduction (PGMDR) method to study interactions in the presence or absence of main effects of genes with an adjustment for covariates using limited samples of families. A new test statistic is proposed for classifying the affected and the unaffected in the SVM‐based PGMDR approach to improve performance in detecting gene‐gene interactions. Simulation studies under various scenarios have been performed to compare the performances of the proposed and the original methods. The proposed and original approaches have been applied to a real data example for illustration and comparison. Both the simulation and real data studies show that the proposed SVM and SVM‐based PGMDR methods have great prediction accuracies, consistencies, and power in detecting gene‐gene interactions.

[1]  J. Cheverud,et al.  Epistasis and its contribution to genetic variance components. , 1995, Genetics.

[2]  Der-Chiang Li,et al.  Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation , 2009, Inf. Sci..

[3]  Heather J Cordell,et al.  Properties of case/pseudocontrol analysis for genetic association studies: Effects of recombination, ascertainment, and multiple affected offspring , 2004, Genetic epidemiology.

[4]  Jun Zhu,et al.  A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies. , 2008, American journal of human genetics.

[5]  Jason H. Moore,et al.  An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation , 2004, BMC Bioinformatics.

[6]  Chong Wang,et al.  Statistical Applications in Genetics and Molecular Biology A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic , 2011 .

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[9]  E R Martin,et al.  Genotype‐based association test for general pedigrees: The genotype‐PDT , 2003, Genetic epidemiology.

[10]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[11]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[12]  J. H. Moore,et al.  A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT , 2006, Genetic epidemiology.

[13]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[14]  Shyh-Huei Chen,et al.  A support vector machine approach for detecting gene‐gene interaction , 2008, Genetic epidemiology.

[15]  Blaz Zupan,et al.  SNPsyn: detection and exploration of SNP–SNP interactions , 2011, Nucleic Acids Res..

[16]  W. Oetting,et al.  Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene Interaction in a case-control study , 2009, BMC Medical Genetics.

[17]  Yen-Feng Chiu,et al.  Incorporating Covariates into Multipoint Association Mapping in the Case-Parent Design , 2010, Human Heredity.

[18]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[19]  Kung-Yee Liang,et al.  Multipoint linkage disequilibrium mapping using case‐control designs , 2005, Genetic epidemiology.

[20]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[21]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[22]  J. Listgarten,et al.  Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms , 2004, Clinical Cancer Research.

[23]  Simon Cawley,et al.  Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and single-nucleotide polymorphism genotyping for Genetic Analysis Workshop 14 , 2005, BMC Genetics.

[24]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[25]  E R Martin,et al.  Multifactor dimensionality reduction-phenomics: a novel method to capture genetic heterogeneity with use of phenotypic variables. , 2007, American journal of human genetics.

[26]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Hsin-Chou Yang,et al.  Kernel-Based Association Test , 2008, Genetics.

[29]  Jun Zhu,et al.  A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. , 2007, American journal of human genetics.

[30]  M. L. Calle,et al.  FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals , 2010, PloS one.

[31]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.