SDADB: a functional annotation database of protein structural domains

Abstract Annotating functional terms with individual domains is essential for understanding the functions of full-length proteins. We describe SDADB, a functional annotation database for structural domains. SDADB provides associations between gene ontology (GO) terms and SCOP domains calculated with an integrated framework. GO annotations are assigned probabilities of being correct, which are estimated with a Bayesian network by taking advantage of structural neighborhood mappings, SCOP-InterPro domain mapping information, position-specific scoring matrices (PSSMs) and sequence homolog features, with the most substantial contribution coming from high-coverage structure-based domain-protein mappings. The domain-protein mappings are computed using large-scale structure alignment. SDADB contains ontological terms with probabilistic scores for more than 214 000 distinct SCOP domains. It also provides additional features include 3D structure alignment visualization, GO hierarchical tree view, search, browse and download options. Database URL: http://sda.denglab.org

[1]  Yong Huang,et al.  Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier , 2013 .

[2]  T. Hunter,et al.  The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. , 1988, Science.

[3]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[4]  Hai Fang,et al.  dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more , 2012, Nucleic Acids Res..

[5]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[6]  Zhigang Chen,et al.  An Integrated Framework for Functional Annotation of Protein Structural Domains , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[8]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[9]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[10]  Christine A. Orengo,et al.  Protein function prediction using domain families , 2013, BMC Bioinformatics.

[11]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..

[12]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[13]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[14]  Florencio Pazos,et al.  Gene ontology functional annotations at the structural domain level , 2009, Proteins.

[15]  Zixiang Wang,et al.  Ontological function annotation of long non‐coding RNAs through hierarchical multi‐label classification , 2018, Bioinform..

[16]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[17]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[18]  Jingpu Zhang,et al.  KATZLGO: Large-Scale Prediction of LncRNA Functions by Using the KATZ Measure Based on Multiple Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[20]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[21]  Prudence Mutowo-Meullenet,et al.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation , 2012, Database J. Biol. Databases Curation.

[22]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[23]  J. Sussman,et al.  JSmol and the Next-Generation Web-Based Representation of 3D Molecular Structure as Applied to Proteopedia , 2013 .

[24]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[25]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[26]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Steven E Brenner,et al.  SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database. , 2017, Journal of molecular biology.

[28]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[31]  Hai Fang,et al.  The SUPERFAMILY 1.75 database in 2014: a doubling of data , 2014, Nucleic Acids Res..

[32]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[33]  Q. Zou,et al.  Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition , 2016, International journal of molecular sciences.

[34]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[35]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[36]  Zhigang Chen,et al.  PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility , 2016, BMC Bioinformatics.

[37]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[38]  Zixiang Wang,et al.  Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach , 2018, Bioinform..

[39]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[40]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[41]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[42]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[43]  J. Zheng,et al.  Crystal structure of the catalytic subunit of cyclic adenosine monophosphate-dependent protein kinase. , 1991, Science.

[44]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.