LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources

MOTIVATION The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION http://salilab.org/LS-SNP/supp-info.pdf.

[1]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[2]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[3]  W F Bodmer,et al.  The APC variants I1307K and E1317Q are associated with colorectal tumors, but not always with a family history. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Celia A Schiffer,et al.  Lack of synergy for inhibitors targeting a multi‐drug‐resistant HIV‐1 protease , 2002, Protein science : a publication of the Protein Society.

[6]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[7]  David Haussler,et al.  Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing. , 2004, Genome research.

[8]  Valentin A. Ilyin,et al.  LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures , 2002, Bioinform..

[9]  G Chelvanayagam,et al.  Human theta class glutathione transferase: the crystal structure reveals a sulfate-binding pocket within a buried active site. , 1998, Structure.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[12]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[13]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[14]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[15]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[16]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[17]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[18]  M. L. Jones,et al.  PDBsum: a Web-based database of summaries and analyses of all PDB structures. , 1997, Trends in biochemical sciences.

[19]  P. Vreken,et al.  Dihydropyrimidine dehydrogenase (DPD) deficiency: identification and expression of missense mutations C29R, R886H and R235W , 1997, Human Genetics.

[20]  R E Pyeritz,et al.  Fifteen novel FBN1 mutations causing Marfan syndrome detected by heteroduplex analysis of genomic amplicons. , 1995, American journal of human genetics.

[21]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[22]  M R Clark,et al.  Sequences of complementary DNAs that encode the NA1 and NA2 forms of Fc receptor III on human neutrophils. , 1989, The Journal of clinical investigation.

[23]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[24]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[25]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[26]  S. Kasif,et al.  Structural location of disease-associated single-nucleotide polymorphisms. , 2003, Journal of molecular biology.

[27]  Ryan D. Morin,et al.  The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). , 2004, Genome research.

[28]  I Tomlinson,et al.  Germline APC variants in patients with multiple colorectal adenomas, with evidence for the particular importance of E1317Q. , 2000, Human molecular genetics.

[29]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[30]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[31]  András Fiser,et al.  Modeling mutations in protein structures , 2007, Protein science : a publication of the Protein Society.

[32]  Alexander V. Diemand,et al.  The Swiss‐Prot variant page and the ModSNP database: A resource for sequence and structure information on human protein variants , 2004, Human mutation.

[33]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[34]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[35]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[36]  S Subramaniam,et al.  Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape , 1998, Proteins.

[37]  Fred P. Davis,et al.  PIBASE: a comprehensive database of structurally defined protein interfaces , 2005, Bioinform..

[38]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[39]  F. Gonzalez,et al.  The CYP2A3 gene product catalyzes coumarin 7-hydroxylation in human liver microsomes. , 1990, Biochemistry.

[40]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[41]  Andrej Sali,et al.  Improving Functional Annotation of Non-Synonomous SNPs with Information Theory , 2005, Pacific Symposium on Biocomputing.

[42]  Hans R Waterham,et al.  Novel disease-causing mutations in the dihydropyrimidine dehydrogenase gene interpreted by analysis of the three-dimensional protein structure. , 2002, The Biochemical journal.

[43]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[44]  A. Kossiakoff,et al.  Structural consequences of mutation. , 1992, Current opinion in biotechnology.

[45]  David Haussler,et al.  The UCSC Proteome Browser , 2004, Nucleic Acids Res..

[46]  R. Tyndale,et al.  Nicotine metabolism defect reduces smoking , 1998, Nature.

[47]  Marc A. Martí-Renom,et al.  Tools for comparative protein structure modeling and analysis , 2003, Nucleic Acids Res..

[48]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[50]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[51]  G Chelvanayagam,et al.  A homology model for the human theta‐class glutathione transferase T1–1 , 1998, Proteins.

[52]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[53]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.