On the evolutionary origins of "Fold Space Continuity": a study of topological convergence and divergence in mixed alpha-beta domains.

Existing protein structure classifications group proteins by overall structural similarity at the highest level and by evolutionary relationships at the lowest level, deriving higher-level groups by pairwise structure comparison. For this to be successful requires that large changes in structure are relatively rare in evolution and that proteins with no detectable evolutionary relationship do not converge on similar global chain conformations since this creates conflicts between structural and evolutionary consistency. Analysis of global structural changes using core topological descriptions for 4261 domains from classes C and D of the SCOP database and new measures of topological distance and consistency of classification showed that the topological consistency of SCOP folds is highly variable with some folds having no consistent description and significant overlaps between groups including some members of separate folds with identical topological descriptions. Topological clustering shows that including sufficient indels to allow family members to be joined would also require joining several distinct folds. We conclude that evolutionary changes in the global topology of protein domains are the root cause of many difficulties for present approaches to structure classification using pairwise comparison. As a resolution we propose that a purely structural classification should be created using an approach similar to that adopted by the Gene Ontology in which proteins are assigned labels describing structure.

[1]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[2]  William R. Taylor,et al.  Analysis of the tertiary structure of protein β-sheet sandwiches , 1981 .

[3]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[4]  A. Efimov,et al.  Novel structural tree of β-proteins containing abcd units , 2008, Molecular Biology.

[5]  Ian Sillitoe,et al.  The CATH classification revisited—architectures reviewed and new ways to characterize structural divergence in superfamilies , 2008, Nucleic Acids Res..

[6]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[7]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C. Orengo,et al.  Correlation of observed fold frequency with the occurrence of local structural motifs. , 1999, Journal of molecular biology.

[10]  Z. X. Wang,et al.  How many fold types of protein are there in nature? , 1996, Proteins.

[11]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[12]  J. Richardson,et al.  Handedness of crossover connections in beta sheets. , 1976, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Ralf Zimmer,et al.  Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis , 2009, BMC Structural Biology.

[14]  William R Taylor,et al.  De novo backbone scaffolds for protein design , 2009, Proteins.

[15]  Barry Honig,et al.  Is protein classification necessary? Toward alternative approaches to function annotation. , 2009, Current opinion in structural biology.

[16]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[17]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[18]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[19]  J. Richardson,et al.  β-Sheet topology and the relatedness of proteins , 1977, Nature.

[20]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[21]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[23]  G M Crippen,et al.  The tree structural organization of proteins. , 1978, Journal of molecular biology.

[24]  Israel M. Gelfand,et al.  Common features in structures and sequences of sandwich-like proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David R. Gilbert,et al.  A novel method for comparing topological models of protein structures enhanced with ligand information , 2008, Bioinform..

[26]  Sameer Velankar,et al.  PDBe: Protein Data Bank in Europe , 2010, Nucleic Acids Res..

[27]  Angel R. Ortiz,et al.  Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures , 2009, PLoS Comput. Biol..

[28]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[29]  A. Efimov Structural trees for protein superfamilies , 1997, Proteins.

[30]  William R Taylor,et al.  Protein fold comparison by the alignment of topological strings. , 2003, Protein engineering.

[31]  William R. Taylor,et al.  A ‘periodic table’ for protein structures , 2002, Nature.

[32]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[33]  Iosif I Vaisman,et al.  A new topological method to measure protein structure similarity. , 2003, Biochemical and biophysical research communications.

[34]  E G Hutchinson,et al.  The Greek key motif: extraction, classification and analysis. , 1993, Protein engineering.

[35]  Markus Fischer,et al.  Structural relationships among proteins with different global topologies and their implications for function annotation strategies , 2009, Proceedings of the National Academy of Sciences.

[36]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[37]  Adam Godzik,et al.  Connecting the protein structure universe by using sparse recurring fragments. , 2005, Structure.

[38]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[39]  Nick V. Grishin,et al.  Structural drift: a possible path to protein fold change , 2005, Bioinform..

[40]  T. P. Flores,et al.  Protein structural topology: Automated analysis and diagrammatic representation , 2008, Protein science : a publication of the Protein Society.

[41]  Nick V Grishin,et al.  A tale of two ferredoxins: sequence similarity and structural differences , 2006 .

[42]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[43]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[44]  N. Grishin,et al.  KH domain: one motif, two folds. , 2001, Nucleic acids research.

[45]  T. P. Flores,et al.  An algorithm for automatically generating protein topology cartoons. , 1994, Protein engineering.

[46]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[47]  William R. Taylor,et al.  An ellipsoidal approximation of protein shape , 1983 .

[48]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[49]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[50]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Janet M. Thornton,et al.  Topological and stereochemical restrictions in β-sandwich protein structures , 1993 .

[52]  William R Taylor,et al.  Probing the "dark matter" of protein fold space. , 2009, Structure.

[53]  David R. Gilbert,et al.  Protein structure topological comparison, discovery and matching service , 2005, Bioinform..

[54]  David R. Gilbert,et al.  An optimized TOPS+ comparison method for enhanced TOPS models , 2010, BMC Bioinformatics.