An Analytic Approach Using Candidate Gene Selection and Logic Forest to Identify Gene by Environment Interactions (G × E) for Systemic Lupus Erythematosus in African Americans

Development and progression of many human diseases, such as systemic lupus erythematosus (SLE), are hypothesized to result from interactions between genetic and environmental factors. Current approaches to identify and evaluate interactions are limited, most often focusing on main effects and two-way interactions. While higher order interactions associated with disease are documented, they are difficult to detect since expanding the search space to all possible interactions of p predictors means evaluating 2p − 1 terms. For example, data with 150 candidate predictors requires considering over 1045 main effects and interactions. In this study, we present an analytical approach involving selection of candidate single nucleotide polymorphisms (SNPs) and environmental and/or clinical factors and use of Logic Forest to identify predictors of disease, including higher order interactions, followed by confirmation of the association between those predictors and interactions identified with disease outcome using logistic regression. We applied this approach to a study investigating whether smoking and/or secondhand smoke exposure interacts with candidate SNPs resulting in elevated risk of SLE. The approach identified both genetic and environmental risk factors, with evidence suggesting potential interactions between exposure to secondhand smoke as a child and genetic variation in the ITGAM gene associated with increased risk of SLE.

[1]  K. Olden,et al.  Gene-Environment Interactions in the Development of Complex Disease Phenotypes , 2008, International journal of environmental research and public health.

[2]  Thomas Hofmann,et al.  Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms , 2002, J. Comput. Biol..

[3]  F. Clerget-Darpoux,et al.  Strategy for Detecting Susceptibility Genes with Weak or No Marginal Effect , 2007, Human Heredity.

[4]  T. Vyse,et al.  The genetics of lupus: a functional perspective , 2012, Arthritis Research & Therapy.

[5]  Bethany J. Wolf,et al.  Logic Forest: an ensemble classifier for discovering logical combinations of binary markers , 2010, Bioinform..

[6]  M. Hernán,et al.  Case‐only gene‐environment interaction studies: when does association imply mechanistic interaction? , 2010, Genetic epidemiology.

[7]  B. Tsao,et al.  Genetic susceptibility to systemic lupus erythematosus in the genomic era , 2010, Nature Reviews Rheumatology.

[8]  E. Karlson,et al.  Effect of interactions of glutathione S-transferase T1, M1, and P1 and HMOX1 gene promoter polymorphisms with heavy smoking on the risk of rheumatoid arthritis. , 2010, Arthritis and rheumatism.

[9]  Y. Shoenfeld,et al.  Environment and lupus-related diseases , 2012, Lupus.

[10]  David C. McLean,et al.  Three Novel mtDNA Restriction Site Polymorphisms Allow Exploration of Population Affinities of African Americans , 2003, Human biology.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Shun-Chiao Chang,et al.  Genetic polymorphisms in PTPN22, PADI-4, and CTLA-4 and risk for rheumatoid arthritis in two longitudinal cohort studies: evidence of gene-environment interactions with heavy cigarette smoking , 2008, Arthritis research & therapy.

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[15]  T. Mack,et al.  A revised estimate of twin concordance in systemic lupus erythematosus. , 1992, Arthritis and rheumatism.

[16]  M. LeBlanc,et al.  Logic Regression , 2003 .

[17]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[18]  S. Shaftman,et al.  Autoantibody prevalence and lupus characteristics in a unique African American population. , 2008, Arthritis and rheumatism.

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  R. Cantor,et al.  Association of Genetic Variants in Complement Factor H and Factor H-Related Genes with Systemic Lupus Erythematosus Susceptibility , 2011, PLoS genetics.

[21]  S. Bae,et al.  Interferon-gamma gene polymorphisms associated with susceptibility to systemic lupus erythematosus , 2009, Annals of the rheumatic diseases.

[22]  Marta E Alarcón-Riquelme,et al.  Familial aggregation of systemic lupus erythematosus, rheumatoid arthritis, and other autoimmune diseases in 1,177 lupus patients from the GLADEL cohort. , 2005, Arthritis and rheumatism.

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  P S Albert,et al.  Limitations of the case-only design for identifying gene-environment interactions. , 2001, American journal of epidemiology.

[25]  Andrew G Rundle,et al.  Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias. , 2004, International journal of epidemiology.

[26]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[27]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[28]  D. Allison,et al.  Estimating African American admixture proportions by use of population-specific alleles. , 1998, American journal of human genetics.

[29]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[30]  K. Hunt,et al.  Successes, Challenges and Lessons Learned: Community-engaged research with South Carolina's Gullah population. , 2013, Gateways : international journal of community research & engagement.

[31]  E. Karlson,et al.  Gene–environment interaction between HLA-DRB1 shared epitope and heavy cigarette smoking in predicting incident rheumatoid arthritis , 2009, Annals of the rheumatic diseases.

[32]  R. Kittles,et al.  Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. , 2001, American journal of physical anthropology.

[33]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2017 , 2016, Nucleic Acids Res..

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  J. Harley,et al.  Genetic susceptibility to lupus: the biological basis of genetic risk found in B cell signaling pathways , 2012, Journal of leukocyte biology.

[36]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[37]  W D Flanders,et al.  Case-only design to measure gene-gene interaction. , 1999, Epidemiology.

[38]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[39]  Chris S. Haley,et al.  Epistasis: too often neglected in complex trait studies? , 2004, Nature Reviews Genetics.

[40]  P. Gregersen,et al.  Transancestral mapping and genetic load in systemic lupus erythematosus , 2017, Nature Communications.

[41]  Diane Gilbert-Diamond,et al.  Analysis of gene-gene interactions. , 2011, Current protocols in human genetics.

[42]  David C. McLean,et al.  Mitochondrial DNA (mtDNA) haplotypes reveal maternal population genetic affinities of Sea Island Gullah-speaking African Americans. , 2005, American journal of physical anthropology.

[43]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[44]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[45]  O. Carlborg,et al.  A Unified Model for Functional and Statistical Epistasis and Its Application in Quantitative Trait Loci Analysis , 2007, Genetics.

[46]  W D Flanders,et al.  Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! , 1996, American journal of epidemiology.

[47]  Andreas Ziegler,et al.  Do little interactions get lost in dark random forests? , 2016, BMC Bioinformatics.

[48]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[49]  M. Hochberg,et al.  Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. , 1997, Arthritis and rheumatism.

[50]  D J Schaid,et al.  Potential misinterpretation of the case-only study to assess gene-environment interaction. , 1999, American journal of epidemiology.

[51]  Jenny Chang-Claude,et al.  Gene–environment interactions for complex traits: definitions, methodological requirements and challenges , 2008, European Journal of Human Genetics.

[52]  S. Block A brief history of twins , 2006, Lupus.

[53]  David C. McLean,et al.  Mitochondrial DNA genetic diversity among four ethnic groups in Sierra Leone. , 2005, American journal of physical anthropology.

[54]  D. Kamen,et al.  The United States to Africa lupus prevalence gradient revisited , 2011, Lupus.

[55]  T. Horiuchi,et al.  Cigarette smoking, N-acetyltransferase 2 polymorphisms and systemic lupus erythematosus in a Japanese population , 2009, Lupus.

[56]  Don L. Armstrong,et al.  Identification of IRAK1 as a risk gene with critical role in the pathogenesis of systemic lupus erythematosus , 2009, Proceedings of the National Academy of Sciences.