dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions

With the advance of sequencing technologies, whole exome sequencing has increasingly been used to identify mutations that cause human diseases, especially rare Mendelian diseases. Among the analysis steps, functional prediction (of being deleterious) plays an important role in filtering or prioritizing nonsynonymous SNP (NS) for further analysis. Unfortunately, different prediction algorithms use different information and each has its own strength and weakness. It has been suggested that investigators should use predictions from multiple algorithms instead of relying on a single one. However, querying predictions from different databases/Web‐servers for different algorithms is both tedious and time consuming, especially when dealing with a huge number of NSs identified by exome sequencing. To facilitate the process, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions). It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS in the human genome (a total of 75,931,005). It is the first integrated database of functional predictions from multiple algorithms for the comprehensive collection of human NSs. dbNSFP is freely available for download at http://sites.google.com/site/jpopgen/dbNSFP. Hum Mutat 32:894–899, 2011. © 2011 Wiley‐Liss, Inc.

[1]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[2]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[3]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[4]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[5]  David Haussler,et al.  New Methods for Detecting Lineage-Specific Selection , 2006, RECOMB.

[6]  Hagit Shatkay,et al.  F-SNP: computationally predicted functional SNPs for disease association studies , 2007, Nucleic Acids Res..

[7]  M. Vihinen,et al.  Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods , 2009, Human mutation.

[8]  Alexander R. Pico,et al.  SNPLogic: an interactive single nucleotide polymorphism selection, annotation, and prioritization system , 2008, Nucleic Acids Res..

[9]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[10]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[11]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[12]  Rachel Karchin,et al.  Next generation tools for the annotation of human SNPs , 2009, Briefings Bioinform..

[13]  Peter Tarczy-Hornoch,et al.  SNPit: A federated data integration system for the purpose of functional SNP annotation , 2009, Comput. Methods Programs Biomed..

[14]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[15]  Ernesto Picardi,et al.  Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing , 2010, Briefings Bioinform..

[16]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[17]  Jay Shendure,et al.  Single-nucleotide evolutionary constraint scores highlight disease-causing mutations , 2010, Nature Methods.

[18]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[19]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[20]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[21]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[22]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[23]  Roded Sharan,et al.  Translation efficiency in humans: tissue specificity, global optimization and differences between developmental stages , 2010, Nucleic acids research.

[24]  Emily H Turner,et al.  Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome , 2010, Nature Genetics.

[25]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[26]  J. Shendure,et al.  Massively parallel sequencing and rare disease. , 2010, Human molecular genetics.

[27]  Tero Aittokallio,et al.  Dealing with missing values in large-scale studies: microarray data imputation and beyond , 2010, Briefings Bioinform..

[28]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[29]  Mostafa Ronaghi,et al.  pfSNP: An integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses , 2011, Human mutation.