Structural descriptor database: a new tool for sequence-based functional site prediction

BackgroundThe Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database.ResultsTo assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB.ConclusionSDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.

[1]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[2]  F. Studier,et al.  Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements. , 1983, Journal of molecular biology.

[3]  Tu Bao Ho,et al.  Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data , 2005, PKDD.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Dong Hae Shin,et al.  Structure-based functional inference in structural genomics , 2004, Journal of Structural and Functional Genomics.

[6]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[7]  G Neshich,et al.  Proteinase inhibition using small Bowman-Birk-type structures. , 2007, Genetics and molecular research : GMR.

[8]  X Chen,et al.  BindingDB: a web-accessible molecular recognition database. , 2001, Combinatorial chemistry & high throughput screening.

[9]  Michael Schroeder,et al.  Using structural motif descriptors for sequence-based binding site prediction , 2007, BMC Bioinformatics.

[10]  Kshama Goyal,et al.  PAR-3D: a server to predict protein active site residues , 2007, Nucleic Acids Res..

[11]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[12]  S. J. Campbell,et al.  Ligand binding: functional site location, similarity and docking. , 2003, Current opinion in structural biology.

[13]  Edward N Baker,et al.  Protein structure prediction and analysis as a tool for functional genomics. , 2003, Applied bioinformatics.

[14]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[15]  A Bairoch,et al.  The SWISS-PROT protein sequence database: its relevance to human molecular medical research. , 1997, Journal of molecular medicine.

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  Keunwan Park,et al.  A method to detect important residues using protein binding site comparison. , 2006, Genome informatics. International Conference on Genome Informatics.

[18]  Piero Fariselli,et al.  A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[19]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[20]  J M Thornton,et al.  LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. , 1995, Protein engineering.

[21]  K. Hofmann Sensitive Protein Comparisons with Profiles and Hidden Markov Models , 2000, Briefings Bioinform..

[22]  Alex Bateman,et al.  Structural genomics meets computational biology , 2006, Bioinform..

[23]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[24]  Amos Bairoch,et al.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins , 2006, Nucleic Acids Res..

[25]  Toshihisa Takagi,et al.  Support Vector Machines for Predicting Protein-Protein Interactions , 2003 .

[26]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[27]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[28]  Anil K. Kesarwani,et al.  Genome Informatics , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[29]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[30]  Gabriele Ausiello,et al.  Functional annotation by identification of local surface similarities: a novel tool for structural genomics , 2005, BMC Bioinformatics.

[31]  Doo-Ho Cho,et al.  PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures , 2005, Nucleic Acids Res..

[32]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[33]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[34]  Simon Parsons,et al.  Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[35]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[36]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[37]  Yasushi Okuno,et al.  GLIDA: GPCR-ligand database for chemical genomic drug discovery , 2005, Nucleic Acids Res..

[38]  John B. O. Mitchell,et al.  Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes , 2003, Bioinform..

[39]  Jean-Christophe Nebel,et al.  Automatic generation of 3D motifs for classification of protein binding sites , 2007, BMC Bioinformatics.

[40]  Valentin A. Ilyin,et al.  LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures , 2002, Bioinform..

[41]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[42]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[43]  Kengo Kinoshita,et al.  eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape , 2007, Nucleic Acids Res..