Data mining: Efficiency of using sequence databases for polymorphism discovery

An open question in research on Single Nucleotide Polymorphisms (SNPs) is, what is the percentage of true SNPs found by in silico pre‐screening? To this end, we selected 13 genes, and determined the complete collection of “true” polymorphisms, or polymorphisms experimentally detected, existing in these genes in our laboratory using Denaturing High Performance Liquid Chromatography (DHPLC) and fluorescent sequencing, or in other laboratories using comparable methods. The genes studied by our group were PTGS2, IGFBP1, IGFBP3, and CYP19. GenBank sequence information was then aligned using two methods, and sequence differences termed “candidate” polymorphisms. We then compared the series of SNPs obtained experimentally and in silico and we have found that in silico methods are relatively specific (up to 55% of candidate SNPs found by SNPFinder have been discovered by experimental procedure) but have low sensitivity (not more than 27% of true SNPs are found by in silico methods). Hum Mutat 17:141–150, 2001. © 2001 Wiley‐Liss, Inc.

[1]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[2]  H. McLeod,et al.  Characterization of the human dihydropyrimidine dehydrogenase gene. , 1998, Genomics.

[3]  P. Green,et al.  Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags. , 1999, Genome research.

[4]  J. Lafitte,et al.  Polymorphism of the cytochrome P450 CYP2D6 gene in a European population: characterization of 48 mutations and 53 alleles, their frequencies and evolution. , 1997, Pharmacogenetics.

[5]  N. Makridakis,et al.  Biochemical and pharmacogenetic dissection of human steroid 5 alpha-reductase type II. , 2000, Pharmacogenetics.

[6]  P. Vreken,et al.  Nomenclature for human DPYD alleles. , 1998, Pharmacogenetics.

[7]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[8]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[9]  D. Arveiler,et al.  The P-selectin gene is highly polymorphic: reduced frequency of the Pro715 allele carriers in patients with myocardial infarction. , 1998, Human molecular genetics.

[10]  I. M. Jones,et al.  Nonconservative amino acid substitution variants exist at polymorphic frequency in DNA repair genes in healthy humans. , 1998, Cancer research.

[11]  J. Peters,et al.  The human peroxisome proliferator-activated receptor alpha gene: identification and functional characterization of two natural allelic variants. , 2000, Pharmacogenetics.

[12]  L. Jin,et al.  A pre-Columbian Y chromosome-specific transition and its implications for human evolutionary history. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  D. Arveiler,et al.  Polymorphisms of the transforming growth factor-beta 1 gene in relation to myocardial infarction and blood pressure. The Etude Cas-Témoin de l'Infarctus du Myocarde (ECTIM) Study. , 1996, Hypertension.

[14]  D. Xie,et al.  Genetic polymorphism of human O6-alkylguanine-DNA alkyltransferase: identification of a missense variation in the active site region. , 1999, Pharmacogenetics.

[15]  J. Kere,et al.  A candidate gene for psoriasis near HLA-C, HCR (Pg8), is highly polymorphic with a disease-associated susceptibility allele. , 2000, Human molecular genetics.

[16]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[17]  P. Kwok,et al.  Single nucleotide polymorphism hunting in cyberspace , 1998, Human mutation.

[18]  Y. Osher,et al.  The polymorphic inositol polyphosphate 1-phosphatase gene as a candidate for pharmacogenetic prediction of lithium-responsive manic-depressive illness. , 1998, Pharmacogenetics.

[19]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.