Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions

BackgroundThe risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.ResultsWe developed the Polymorphism Interaction Analysis tool (PIA version 2.0) to include different approaches for ranking and scoring SNP combinations, to account for imbalances between case and control ratios, stratify on particular factors, and examine associations of user-defined pathways (based on SNP or gene) with case status. PIA v. 2.0 detected 2-SNP interactions as the highest ranking model 77% of the time, using simulated data sets of genetic models of interaction (minor allele frequency = 0.2; heritability = 0.01; N = 1600) generated previously [Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007, 31:306–315.]. Interacting SNPs were detected in both balanced (20 SNPs) and imbalanced data (case:control 1:2 and 1:4, 10 SNPs) in the context of non-interacting SNPs.ConclusionPIA v. 2.0 is a useful tool for exploring gene*gene or gene*environment interactions and identifying a small number of putative associations which may be investigated further using other statistical methods and in replication study populations.

[1]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[2]  N E Day,et al.  Multi-factor dimensionality reduction applied to a large prospective investigation on gene-gene and gene-environment interactions. , 2006, Carcinogenesis.

[3]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[4]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[5]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[6]  Hugues Sicotte,et al.  SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes , 2005, Nucleic Acids Res..

[7]  J. Gray,et al.  The genetics and genomics of cancer , 2003, Nature Genetics.

[8]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.

[9]  P. McKeigue,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. Problems of Reporting Genetic Associations with Complex Outcomes , 2022 .

[10]  A. Latiano,et al.  Regularized Least Squares Classifiers may Predict Crohn's Disease from Profiles of Single Nucleotide Polymorphisms , 2007, Annals of human genetics.

[11]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[12]  R. Millikan,et al.  Bladder cancer predisposition: a multigenic approach to DNA-repair and cell-cycle-control genes. , 2006, American journal of human genetics.

[13]  D. Hunter,et al.  Molecular Epidemiology of Cancer , 2005, CA: a cancer journal for clinicians.

[14]  A. G. Heidema,et al.  The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases , 2006, BMC Genetics.

[15]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[16]  N. Chatterjee,et al.  Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. , 2006, American journal of human genetics.

[17]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[18]  Jason H. Moore,et al.  An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation , 2004, BMC Bioinformatics.

[19]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[20]  U. Langsenlehner,et al.  A multigenic approach to predict breast cancer risk , 2007, Breast Cancer Research and Treatment.

[21]  Margaret A. Pericak-Vance,et al.  Complex gene–gene interactions in multiple sclerosis: a multifactorial approach reveals associations with inflammatory genes , 2006, Neurogenetics.

[22]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[23]  Leah E. Mechanic,et al.  Exploring SNP‐SNP interactions and colon cancer risk using polymorphism interaction analysis , 2006, International journal of cancer.