Touring Protein Space with Matt

Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based metrics of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level. Implications for the debate over the organization of protein fold space are discussed.

[1]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[2]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[3]  Bonnie Kirkpatrick,et al.  STRALCP—structure alignment-based clustering of proteins , 2007, Nucleic acids research.

[4]  Thomas Mailund,et al.  Rapid Neighbour-Joining , 2008, WABI.

[5]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[6]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[7]  Jean-François Gibrat,et al.  Towards an automatic classification of protein structural domains based on structural similarity , 2008, BMC Bioinformatics.

[8]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[9]  Sung-Hou Kim,et al.  Evolution of protein structural classes and protein sequence families , 2006, Proceedings of the National Academy of Sciences.

[10]  Ruben E. Valas,et al.  Nothing about protein structure classification makes sense except in the light of evolution. , 2009, Current opinion in structural biology.

[11]  Manfred J. Sippl,et al.  QSCOP - SCOP quantified by structural relationships , 2007, Bioinform..

[12]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[13]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[16]  Dong Xu,et al.  A fast SCOP fold classification system using content-based E-Predict algorithm , 2005, BMC Bioinformatics.

[17]  Peter Lackner,et al.  Accuracy analysis of multiple structure alignments , 2009, Protein science : a publication of the Protein Society.

[18]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[19]  Stella Veretnik,et al.  Toward consistent assignment of structural domains in proteins. , 2004, Journal of molecular biology.

[20]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[21]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[22]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[23]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[24]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[25]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[26]  Burkhard Rost,et al.  Did evolution leap to create the protein universe? , 2002, Current opinion in structural biology.

[27]  Jean-François Gibrat,et al.  ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification , 2006, BMC Bioinform..

[28]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[29]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[30]  Yuan Qi,et al.  SCOPmap: Automated assignment of protein structures to evolutionary superfamilies , 2004, BMC Bioinformatics.

[31]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[32]  Richard C. Wilson,et al.  Flexible structural protein alignment by a sequence of local transformations , 2009, Bioinform..

[33]  Eytan Domany,et al.  Automated assignment of SCOP and CATH protein structure classifications from FSSP scores , 2002, Proteins.

[34]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[35]  P E Bourne,et al.  An alternative view of protein fold space , 2000, Proteins.

[36]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.