Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis

BackgroundSCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis.ResultsWe create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH.ConclusionVia an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification.AvailabilityBenchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath.

[1]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[2]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[3]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[4]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[6]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[7]  Ralf Zimmer,et al.  Vorolign - fast structural alignment using Voronoi contacts , 2007, Bioinform..

[8]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[9]  Ralf Zimmer,et al.  Protein structure alignment considering phenotypic plasticity , 2008, ECCB.

[10]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[11]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[12]  Ralf Zimmer,et al.  Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction , 2002, Pacific Symposium on Biocomputing.

[13]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[14]  Ralf Zimmer,et al.  AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings , 2007, Bioinform..

[15]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[16]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[17]  Adam Godzik,et al.  Fragnostic: walking through protein structure space , 2005, Nucleic Acids Res..

[18]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[19]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[20]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[21]  Jason Weston,et al.  SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition , 2007, BMC Bioinformatics.

[22]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[23]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R. Zimmer,et al.  Alternative splicing and protein structure evolution. , 2008, Nucleic acids research.

[25]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..