PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease

As knowledge of human genetic polymorphisms grows, so does the opportunity and challenge of identifying those polymorphisms that may impact the health or disease risk of an individual person. A critical need is to organize large-scale polymorphism analyses and to prioritize candidate non-synonymous coding SNPs (nsSNPs) that should be tested in experimental and epidemiological studies to establish their context-specific impacts on protein function. In addition, with emerging high-resolution clinical genetics testing, new polymorphisms must be analyzed in the context of all available protein feature knowledge including other known mutations and polymorphisms. To approach this, we developed PolyDoms () as a database to integrate the results of multiple algorithmic procedures and functional criteria applied to the entire Entrez dbSNP dataset. In addition to predicting structural and functional impacts of all nsSNPs, filtering functions enable group-based identification of potentially harmful nsSNPs among multiple genes associated with specific diseases, anatomies, mammalian phenotypes, gene ontologies, pathways or protein domains. PolyDoms, thus, provides a means to derive a list of candidate SNPs to be evaluated in experimental or epidemiological studies for impact on protein functions and disease risk associations. PolyDoms will continue to be curated to improve its usefulness.

[1]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[2]  S Ichii,et al.  The APC gene, responsible for familial adenomatous polyposis, is mutated in human gastric cancer. , 1992, Cancer research.

[3]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[4]  A. Chakravarti It's raining SNPs, hallelujah? , 1998, Nature Genetics.

[5]  Anders Isaksson,et al.  First International SNP Meeting at Skokloster, Sweden, August 1998. Enthusiasm mixed with scepticism about single-nucleotide polymorphism markers for dissecting complex disorders , 1999, European Journal of Human Genetics.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[8]  P. Bork,et al.  Towards a structural basis of human non-synonymous single nucleotide polymorphisms. , 2000, Trends in genetics : TIG.

[9]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[10]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[11]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[12]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[13]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[14]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[15]  Russ B. Altman,et al.  MutDB: annotating human variation with functionally relevant data , 2003, Bioinform..

[16]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[17]  Sivakumar Gowrisankar,et al.  Pattern of sequence variation across 213 environmental response genes. , 2004, Genome research.

[18]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[19]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[20]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[21]  Timothy R. Rebbeck,et al.  Assessing the function of genetic variants in candidate gene association studies , 2004, Nature Reviews Genetics.

[22]  Matthew B Schabath,et al.  An Evolutionary Perspective on Single-Nucleotide Polymorphism Screening in Molecular Cancer Epidemiology , 2004, Cancer Research.

[23]  H. Ozçelik,et al.  Identifying functional genetic variants in DNA repair pathway using protein conservation analysis. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[24]  Simon Kasif,et al.  topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association , 2004, Nucleic Acids Res..

[25]  I. M. Jones,et al.  Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function. , 2004, Genomics.

[26]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[27]  Chu Chen,et al.  Screening for Deleterious Nonsynonymous Single-Nucleotide Polymorphisms in Genes Involved in Steroid Hormone Metabolism and Response , 2005, Cancer Epidemiology Biomarkers & Prevention.

[28]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[29]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[30]  David Haussler,et al.  LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources , 2005, Bioinform..

[31]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[32]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[33]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[34]  Modesto Orozco,et al.  PMUT: a web-based tool for the annotation of pathological mutations on proteins , 2005, Bioinform..

[35]  Sean D. Mooney,et al.  Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis , 2005, Briefings Bioinform..

[36]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[37]  David Haussler,et al.  The UCSC Proteome Browser , 2004, Nucleic Acids Res..

[38]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[39]  Joost Schymkowitz,et al.  Bioinformatics Applications Note Snpeffect V2.0: a New Step in Investigating the Molecular Phenotypic Effects of Human Non-synonymous Snps , 2022 .

[40]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..