Automated functional classification of experimental and predicted protein structures

BackgroundProteins that are similar in sequence or structure may perform different functions in nature. In such cases, function cannot be inferred from sequence or structural similarity.ResultsWe analyzed experimental structures belonging to the Structural Classification of Proteins (SCOP) database and showed that about half of them belong to multi-functional fold families for which protein similarity alone is not adequate to assign function. We also analyzed predicted structures from the LiveBench and the PDB-CAFASP experiments and showed that accurate homology-based functional assignments cannot be achieved approximately one third of the time, when the protein is a member of a multi-functional fold family. We then conducted extended performance evaluation and comparisons on both experimental and predicted structures using our Functional Signatures from Structural Alignments (FSSA) algorithm that we previously developed to handle the problem of classifying proteins belonging to multi-functional fold families.ConclusionThe results indicate that the FSSA algorithm has better accuracy when compared to homology-based approaches for functional classification of both experimental and predicted protein structures, in part due to its use of local, as opposed to global, information for classifying function. The FSSA algorithm has also been implemented as a webserver and is available at http://protinfo.compbio.washington.edu/fssa.

[1]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[2]  Kai Wang,et al.  FSSA: a novel method for identifying functional signatures from structural alignments , 2005, Bioinform..

[3]  Jane K. Setlow,et al.  Genetic Engineering: Principles and Methods , 1979, Genetic Engineering: Principles and Methods.

[4]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[5]  Dong Xu,et al.  Characterization of protein structure and function at genome scale with a computational prediction pipeline. , 2003, Genetic engineering.

[6]  Philip E. Bourne,et al.  Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models , 2005, PLoS Comput. Biol..

[7]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[8]  J. Thornton,et al.  The (betaalpha)(8) glycosidases: sequence and structure analyses suggest distant evolutionary relationships. , 2001, Protein engineering.

[9]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  C. Chothia,et al.  Understanding protein structure: using scop for fold interpretation. , 1996, Methods in enzymology.

[12]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[13]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[14]  J. Skolnick,et al.  Enhanced functional annotation of protein sequences via the use of structural descriptors. , 2001, Journal of structural biology.

[15]  Shing-Chung Ngan,et al.  PROTINFO: new algorithms for enhanced protein structure predictions , 2005, Nucleic Acids Res..

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[17]  L Rychlewski,et al.  From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions , 1999, Protein science : a publication of the Protein Society.

[18]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[19]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[20]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[21]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[22]  Janet M Thornton,et al.  Inferring protein function from structure. , 2003, Methods of biochemical analysis.

[23]  D. Fischer,et al.  The 2002 Olympic Games of protein structure prediction. , 2003, Protein Engineering.

[24]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[26]  L Rychlewski,et al.  Fold predictions for bacterial genomes. , 2001, Journal of structural biology.

[27]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[28]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[29]  J. Thornton,et al.  The (βα)8 glycosidases: sequence and structure analyses suggest distant evolutionary relationships , 2001 .

[30]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[31]  Poethig Rs,et al.  Life with 25,000 genes. , 2001 .

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[34]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[35]  Leszek Rychlewski,et al.  LiveBench‐8: The large‐scale, continuous assessment of automated protein structure prediction , 2005, Protein science : a publication of the Protein Society.

[36]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[38]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[39]  A. Dillmann Enzyme Nomenclature , 1965, Nature.

[40]  K. Ginalski,et al.  Protein structure prediction of CASP5 comparative modeling and fold recognition targets using consensus alignment approach and 3D assessment , 2003, Proteins.

[41]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[42]  Ram Samudrala,et al.  PROTINFO: secondary and tertiary protein structure prediction , 2003, Nucleic Acids Res..

[43]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[44]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[45]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[46]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[47]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[48]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[49]  Nick V Grishin,et al.  A comprehensive update of the sequence and structure classification of kinases , 2015 .