The Predicted Impact of Coding Single Nucleotide Polymorphisms Database

Nonsynonymous single nucleotide polymorphisms (nsSNP) have the potential to affect the structure or function of expressed proteins and are, therefore, likely to represent modifiers of inherited susceptibility. We have classified and catalogued the predicted functionality of nsSNPs in genes relevant to the biology of cancer to facilitate sequence-based association studies. Candidate genes were identified using targeted search terms and pathways to interrogate the Gene Ontology Consortium database, Kyoto Encyclopedia of Genes and Genomes database, Iobion's Interaction Explorer PathwayAssist Program, National Center for Biotechnology Information Entrez Gene database, and CancerGene database. A total of 9,537 validated nsSNPs located within annotated genes were retrieved from National Center for Biotechnology Information dbSNP Build 123. Filtering this list and linking it to 7,080 candidate genes yielded 3,666 validated nsSNPs with minor allele frequencies ≥0.01 in Caucasian populations. The functional effect of nsSNPs in genes with a single mRNA transcript was predicted using three computational tools—Grantham matrix, Polymorphism Phenotyping, and Sorting Intolerant from Tolerant algorithms. The resultant pool of 3,009 fully annotated nsSNPs is accessible from the Predicted Impact of Coding SNPs database at http://www.icr.ac.uk/cancgen/molgen/MolPopGen_PICS_database.htm. Predicted Impact of Coding SNPs is an ongoing project that will continue to curate and release data on the putative functionality of coding SNPs.

[1]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[2]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank: current status. , 1994, Nucleic acids research.

[4]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[5]  M Krawczak,et al.  Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. , 1998, American journal of human genetics.

[6]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[7]  E. Lander,et al.  Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999 .

[8]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[11]  P. Bork,et al.  Towards a structural basis of human non-synonymous single nucleotide polymorphisms. , 2000, Trends in genetics : TIG.

[12]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[13]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[14]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[15]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[16]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[17]  M. Miller,et al.  Understanding human disease mutations through the use of interspecific genetic variation. , 2001, Human molecular genetics.

[18]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[19]  Yan P. Yuan,et al.  HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources , 2002, Nucleic Acids Res..

[20]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[21]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[22]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[23]  S. Sherry,et al.  Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. , 2002, Genome research.

[24]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[25]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[26]  S. Kasif,et al.  Structural location of disease-associated single-nucleotide polymorphisms. , 2003, Journal of molecular biology.

[27]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[28]  Conrad C. Huang,et al.  Natural variation in human membrane transporter genes reveals evolutionary and functional constraints , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Douglas F. Easton,et al.  Association studies for finding cancer-susceptibility genetic variants , 2004, Nature Reviews Cancer.

[30]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[31]  J. Peto,et al.  The search for low-penetrance cancer susceptibility alleles , 2004, Oncogene.

[32]  H. Ozçelik,et al.  Identifying functional genetic variants in DNA repair pathway using protein conservation analysis. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[33]  I. M. Jones,et al.  Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function. , 2004, Genomics.

[34]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[35]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[36]  Wen-Hsiung Li,et al.  Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications , 2005, Journal of Molecular Evolution.

[37]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..