Fold classification based on secondary structure – how much is gained by including loop topology?

BackgroundIt has been proposed that secondary structure information can be used to classify (to some extend) protein folds. Since this method utilizes very limited information about the protein structure, it is not surprising that it has a higher error rate than the approaches that use full 3D fold description. On the other hand, the comparing of 3D protein structures is computing intensive. This raises the question to what extend the error rate can be decreased with each new source of information, especially if the new information can still be used with simple alignment algorithms.We consider the question whether the information about closed loops can improve the accuracy of this approach. While the answer appears to be obvious, we had to overcome two challenges. First, how to code and to compare topological information in such a way that local alignment of strings will properly identify similar structures. Second, how to properly measure the effect of new information in a large data sample.We investigate alternative ways of computing and presenting this information.ResultsWe used the set of beta proteins with at most 30% pairwise identity to test the approach; local alignment scores were used to build a tree of clusters which was evaluated using a new log-odd cluster scoring function. In particular, we derive a closed formula for the probability of obtaining a given score by chance.Parameters of local alignment function were optimized using a genetic algorithm.Of 81 folds that had more than one representative in our data set, log-odds scores registered significantly better clustering in 27 cases and significantly worse in 6 cases, and small differences in the remaining cases. Various notions of the significant change or average change were considered and tried, and the results were all pointing in the same direction.ConclusionWe found that, on average, properly presented information about the loop topology improves noticeably the accuracy of the method but the benefits vary between fold families as measured by log-odds cluster score.

[1]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[2]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[3]  Jens Meiler,et al.  Strand‐loop‐strand motifs: Prediction of hairpins and diverging turns in proteins , 2004, Proteins.

[4]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[5]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[6]  Jean Garnier,et al.  FORESST: fold recognition from secondary structure predictions of proteins , 1999, Bioinform..

[7]  I M Gelfand,et al.  Prediction of the structural motifs of sandwich proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[9]  Silvio C. E. Tosatto,et al.  The SSEA server for protein secondary structure alignment , 2005, Bioinform..

[10]  Liam J McGuffin,et al.  Targeting novel folds for structural genomics , 2002, Proteins.

[11]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[12]  Ambuj K. Singh,et al.  PSI: indexing protein structures for fast similarity search , 2003, ISMB.

[13]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[14]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[15]  Janet M. Thornton,et al.  Toward predicting protein topology: An approach to identifying β hairpins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Janet M Thornton,et al.  Toward predicting protein topology: an approach to identifying beta hairpins. , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  D Xu,et al.  Prediction of protein supersecondary structures based on the artificial neural network method. , 1997, Protein engineering.

[20]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[21]  Gerard J Kleywegt,et al.  Déjà vu all over again: finding and analyzing protein structure similarities. , 2004, Structure.

[22]  G. Kleywegt,et al.  Detecting folding motifs and similarities in protein structures. , 1997, Methods in enzymology.

[23]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[24]  J. Garnier,et al.  Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. , 1997, Journal of molecular biology.

[25]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[26]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[27]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[28]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[29]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[30]  Liam J. McGuffin,et al.  What are the baselines for protein fold recognition? , 2001, Bioinform..

[31]  Silvio C. E. Tosatto,et al.  MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification. , 2003, Protein engineering.

[32]  Matteo Comin,et al.  PROuST: A Comparison Method of Three-Dimensional Structures of Proteins Using Indexing Techniques , 2004, J. Comput. Biol..

[33]  Chris Sander,et al.  3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability , 1995, ISMB.

[34]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.