BMC Bioinformatics BioMed Central Methodology article Genome bioinformatic analysis of nonsynonymous SNPs

BackgroundGenome-wide association studies of common diseases for common, low penetrance causal variants are underway. A proportion of these will alter protein sequences, the most common of which is the non-synonymous single nucleotide polymorphism (nsSNP). It would be an advantage if the functional effects of an nsSNP on protein structure and function could be predicted, both for the final identification process of a causal variant in a disease-associated chromosome region, and in further functional analyses of the nsSNP and its disease-associated protein.ResultsIn the present report we have compared and contrasted structure- and sequence-based methods of prediction to over 5500 genes carrying nearly 24,000 nsSNPs, by employing an automatic comparative modelling procedure to build models for the genes. The nsSNP information came from two sources, the OMIM database which are rare (minor allele frequency, MAF, < 0.01) and are known to cause penetrant, monogenic diseases. Secondly, nsSNP information came from dbSNP125, for which the vast majority of nsSNPs, mostly MAF > 0.05, have no known link to a disease. For over 40% of the nsSNPs, structure-based methods predicted which of these sequence changes are likely to either disrupt the structure of the protein or interfere with the function or interactions of the protein. For the remaining 60%, we generated sequence-based predictions.ConclusionWe show that, in general, the prediction tools are able distinguish disease causing mutations from those mutations which are thought to have a neutral affect. We give examples of mutations in genes that are predicted to be deleterious and may have a role in disease. Contrary to previous reports, we also show that rare mutations are consistently predicted to be deleterious as often as commonly occurring nsSNPs.

[1]  John P. Overington,et al.  Environment‐specific amino acid substitution tables: Tertiary templates and prediction of protein folds , 1992, Protein science : a publication of the Protein Society.

[2]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[3]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[4]  L Adorini,et al.  The interleukin-12/interleukin-12-receptor system: role in normal and pathologic immune responses. , 1998, Annual review of immunology.

[5]  D. Schwartz,et al.  TLR4 mutations are associated with endotoxin hyporesponsiveness in humans , 2000, Nature Genetics.

[6]  Steven J. Schrodi,et al.  A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. , 2007, American journal of human genetics.

[7]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[8]  Yan P. Yuan,et al.  HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources , 2002, Nucleic Acids Res..

[9]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[10]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[11]  Daniel Berleant,et al.  MedKit: a helper toolkit for automatic mining of MEDLINE/PubMed citations , 2005, Bioinform..

[12]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[13]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[14]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[15]  J U Bowie,et al.  Three-dimensional profiles for analysing protein sequence-structure relationships. , 1992, Faraday discussions.

[16]  Gary Peltz,et al.  A polymorphism in the TCF7 gene, C883A, is associated with type 1 diabetes. , 2003, Diabetes.

[17]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[18]  Francis S. Collins,et al.  Erratum: A DNA polymorphism discovery resource for research on human genetic variation (Genome Research (1998) 8 (1229-1231)) , 1999 .

[19]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[20]  Jose Luis Lisani,et al.  The SIFT Method , 2008 .

[21]  William B. Langdon,et al.  BioRAT: extracting biological information from full-length papers , 2004, Bioinform..

[22]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  T. Blundell,et al.  Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. , 2000, Protein engineering.

[24]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[25]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[26]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[27]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[28]  John P. Overington,et al.  Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. , 1993, Journal of molecular biology.

[29]  F. Svensson,et al.  Two genes encoding immune-regulatory molecules (LAG3 and IL7R) confer susceptibility to multiple sclerosis , 2005, Genes and Immunity.

[30]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[31]  Piero Fariselli,et al.  Predicting protein stability changes from sequences using support vector machines , 2005, ECCB/JBI.

[32]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[33]  Elizabeth M. Smigielski,et al.  dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[34]  John P. Overington,et al.  Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction , 1990, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[35]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[36]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[37]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[38]  J. Thornton,et al.  Molecular basis of inherited diseases: a structural perspective. , 2003, Trends in genetics : TIG.

[39]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[40]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[41]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[42]  Emily L. Webb,et al.  The Predicted Impact of Coding Single Nucleotide Polymorphisms Database , 2005, Cancer Epidemiology Biomarkers & Prevention.

[43]  Julian Peto,et al.  Search for low penetrance alleles for colorectal cancer through a scan of 1467 non-synonymous SNPs in 2575 cases and 2707 controls with validation by kin-cohort analysis of 14 704 first-degree relatives. , 2006, Human molecular genetics.

[44]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[45]  David Haussler,et al.  LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources , 2005, Bioinform..