Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution

Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.

[1]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[2]  S V Evans,et al.  SETOR: hardware-lighted three-dimensional solid model representations of macromolecules. , 1993, Journal of molecular graphics.

[3]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[4]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[5]  T L Blundell,et al.  Phylogenetic relationships from three-dimensional protein structures. , 1990, Methods in enzymology.

[6]  Nick V. Grishin,et al.  Estimation of evolutionary distances from protein spatial structures , 1997, Journal of Molecular Evolution.

[7]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[8]  T L Blundell,et al.  A database of globular protein structural domains: clustering of representative family members into similar folds. , 1996, Folding & design.

[9]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[10]  M. Sternberg,et al.  Two new examples of protein structural similarities within the structure-function twilight zone. , 1997, Protein engineering.

[11]  M. Sternberg,et al.  A novel binding site in catalase is suggested by structural similarity to the calycin superfamily. , 1996, Protein engineering.

[12]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[13]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[14]  Frances M. G. Pearl,et al.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. , 2000, Protein engineering.

[15]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[16]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[17]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[18]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[19]  S. Balaji,et al.  PALI - a database of Phylogeny and ALIgnment of homologous protein structures , 2001, Nucleic Acids Res..

[20]  Tom L. Blundell,et al.  Molecular anatomy: Phyletic relationships derived from three-dimensional structures of proteins , 2005, Journal of Molecular Evolution.

[21]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[22]  A G Murzin,et al.  Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. , 1993, Journal of molecular biology.

[23]  S. Balaji,et al.  Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database , 2003, Nucleic Acids Res..

[24]  Janusz M. Bujnicki,et al.  Phylogeny of the Restriction Endonuclease-Like Superfamily Inferred from Comparison of Protein Structures , 2000, Journal of Molecular Evolution.

[25]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[26]  N Srinivasan,et al.  Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. , 2001, Protein engineering.

[27]  G J Barton,et al.  Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. , 1994, Journal of molecular biology.

[28]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[29]  A. Murzin Can homologous proteins evolve different enzymatic activities? , 1993, Trends in biochemical sciences.

[30]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[31]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[32]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[33]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[34]  S. Balaji,et al.  PALI: a database of alignments and phylogeny of homologous protein structures , 2001, Bioinform..

[35]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[36]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[37]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.