Multifactor dimensionality reduction analysis identifies specific nucleotide patterns promoting genetic polymorphisms

BackgroundThe fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation.ResultsWe modeled the relationship between DNA sequence and observed polymorphisms using the novel multifactor dimensionality reduction (MDR) approach. MDR was originally developed to detect synergistic interactions between multiple SNPs that are predictive of disease susceptibility. We initially assembled data from the Broad Institute as a pilot test for the hypothesis that flanking region patterns associate with mutagenesis (n = 2194). We then confirmed and expanded our inquiry with human SNPs within coding regions and their flanking sequences collected from the National Center for Biotechnology Information (NCBI) database (n = 29967) and a control set of sequences (coding region) not associated with SNP sites randomly selected from the NCBI database (n = 29967). We discovered seven flanking region pattern associations in the Broad dataset which reached a minimum significance level of p ≤ 0.05. Significant models (p << 0.001) were detected for each SNP type examined in the larger NCBI dataset. Importantly, the flanking region models were elongated or truncated depending on the nucleotide change. Additionally, nucleotide distributions differed significantly at motif sites relative to the type of variation observed. The MDR approach effectively discerned specific sites within the flanking regions of observed SNPs and their respective identities, supporting the collective contribution of these sites to SNP genesis.ConclusionThe present study represents the first use of this computational methodology for modeling nonlinear patterns in molecular genetics. MDR was able to identify distinct nucleotide patterning around sites of mutations dependent upon the observed nucleotide change. We discovered one flanking region set that included five nucleotides clustered around a specific type of SNP site. Based on the strongly associated patterns identified in this study, it may become possible to scan genomic databases for such clustering of nucleotides in order to predict likely sites of future SNPs, and even the type of polymorphism most likely to occur.

[1]  M. Stoneking Single nucleotide polymorphisms: From the evolutionary past. . . , 2001, Nature.

[2]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[3]  P. V. von Hippel,et al.  Effects of methylation on the stability of nucleic acid conformations. Studies at the polymer level. , 1978, The Journal of biological chemistry.

[4]  J. Feder,et al.  Integrating biogeographic and genetic approaches to investigate the history of bioluminescent colour alleles in the Jamaican click beetle, Pyrophorus plagiophthalamus , 2006, Molecular ecology.

[5]  A. Komar Single Nucleotide Polymorphisms , 2009, Methods in Molecular Biology™.

[6]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[7]  M. Goodman,et al.  To Slip or Skip, Visualizing Frameshift Mutation Dynamics for Error-prone DNA Polymerases* , 2004, Journal of Biological Chemistry.

[8]  Samuel H. Wilson,et al.  Base Substitution Specificity of DNA Polymerase β Depends on Interactions in the DNA Minor Groove* , 1999, The Journal of Biological Chemistry.

[9]  R. Eritja,et al.  Abasic translesion synthesis by DNA polymerase beta violates the "A-rule". Novel types of nucleotide incorporation by human DNA polymerase beta at an abasic lesion in different sequence contexts. , 1997, The Journal of biological chemistry.

[10]  W. Beard,et al.  Structural insights into DNA polymerase beta fidelity: hold tight if you want it right. , 1998, Chemistry & biology.

[11]  Samuel H. Wilson,et al.  Efficiency of Correct Nucleotide Insertion Governs DNA Polymerase Fidelity* , 2002, The Journal of Biological Chemistry.

[12]  T. Steitz,et al.  Structure of large fragment of Escherichia coli DNA polymerase I complexed with dTMP , 2020, Nature.

[13]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[14]  J. H. Moore,et al.  A novel method to identify gene–gene effects in nuclear families: the MDR‐PDT , 2006, Genetic epidemiology.

[15]  Samuel H. Wilson,et al.  Enzyme-DNA Interactions Required for Efficient Nucleotide Incorporation and Discrimination in Human DNA Polymerase β(*) , 1996, The Journal of Biological Chemistry.

[16]  Zhongming Zhao,et al.  Sequence context analysis in the mouse genome: single nucleotide polymorphisms and CpG island sequences. , 2006, Genomics.

[17]  K A Spackman,et al.  A logic-based approach to conceptual data base analysis. , 1983, Medical informatics = Medecine et informatique.

[18]  T. Kunkel DNA Replication Fidelity* , 2004, Journal of Biological Chemistry.

[19]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[20]  Samuel H. Wilson,et al.  Critical role of magnesium ions in DNA polymerase beta's closing and active site assembly. , 2004, Journal of the American Chemical Society.

[21]  Samuel H. Wilson,et al.  Crystal structures of human DNA polymerase beta complexed with gapped and nicked DNA: evidence for an induced fit mechanism. , 1997, Biochemistry.

[22]  Samuel H. Wilson,et al.  Uniquely Altered DNA Replication Fidelity Conferred by an Amino Acid Change in the Nucleotide Binding Pocket of Human Immunodeficiency Virus Type 1 Reverse Transcriptase* , 1999, The Journal of Biological Chemistry.

[23]  Zhongming Zhao,et al.  Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome. , 2002, Genome research.

[24]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[25]  Jason H. Moore,et al.  Ideal discrimination of discrete clinical endpoints using multilocus genotypes , 2004, Silico Biol..

[26]  Xiao-Ping Yang,et al.  Loss of DNA Polymerase β Stacking Interactions with Templating Purines, but Not Pyrimidines, Alters Catalytic Efficiency and Fidelity* , 2002, The Journal of Biological Chemistry.

[27]  T. Kunkel,et al.  Minor groove interactions at the DNA polymerase beta active site modulate single-base deletion error rates. , 2000, The Journal of biological chemistry.

[28]  J. Liu,et al.  Insight into the catalytic mechanism of DNA polymerase beta: structures of intermediate complexes. , 2001, Biochemistry.

[29]  B. Werneburg,et al.  DNA Polymerase β: Structure−Fidelity Relationship from Pre-Steady-State Kinetic Analyses of All Possible Correct and Incorrect Base Pairs for Wild Type and R283A Mutant† , 1997 .