Analysis of protein homology by assessing the (dis)similarity in protein loop regions

Two proteins are considered to have a similar fold if sufficiently many of their secondary structure elements are positioned similarly in space and are connected in the same order. Such a common structural scaffold may arise due to either divergent or convergent evolution. The intervening unaligned regions (“loops”) between the superimposable helices and strands can exhibit a wide range of similarity and may offer clues to the structural evolution of folds. One might argue that more closely related proteins differ less in their nonconserved loop regions than distantly related proteins and, at the same time, the degree of variability in the loop regions in structurally similar but unrelated proteins is higher than in homologs. Here we introduce a new measure for structural (dis)similarity in loop regions that is based on the concept of the Hausdorff metric. This measure is used to gauge protein relatedness and is tested on a benchmark of homologous and analogous protein structures. It has been shown that the new measure can distinguish homologous from analogous proteins with the same or higher accuracy than the conventional measures that are based on comparing proteins in structurally aligned regions. We argue that this result can be attributed to the higher sensitivity of the Hausdorff (dis)similarity measure in detecting particularly evident dissimilarities in structures and draw some conclusions about evolutionary relatedness of proteins in the most populated protein folds. Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  Liisa Holm,et al.  Automated detection of remote homology. , 2002, Current opinion in structural biology.

[2]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[3]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[4]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[5]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[6]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[7]  P. Bork,et al.  Homology among (βα) 8 barrels: implications for the evolution of metabolic pathways 1 1Edited by G. Von Heijne , 2000 .

[8]  N. Grishin,et al.  Gaps in structurally similar proteins: Towards improvement of multiple sequence alignment , 2003, Proteins.

[9]  S F Altschul,et al.  Generalized affine gap costs for protein sequence alignment , 1998, Proteins.

[10]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[11]  Chris Sander,et al.  Decision Support System for the Evolutionary Classification of Protein Structures , 1997, ISMB.

[12]  R B Russell,et al.  Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins. , 2000, Journal of molecular biology.

[13]  W. Pearson,et al.  Evolution of protein sequences and structures. , 1999, Journal of molecular biology.

[14]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[15]  M. Sternberg,et al.  On the prediction of protein structure: The significance of the root-mean-square deviation. , 1980, Journal of molecular biology.

[16]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[17]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[18]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[19]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[20]  O. Ptitsyn,et al.  Similarities of protein topologies: evolutionary divergence, functional convergence or principles of folding? , 1980, Quarterly Reviews of Biophysics.

[21]  Janusz M. Bujnicki,et al.  Comparison of protein structures reveals monophyletic origin of AdoMet-dependent methyltransferase family and mechanistic convergence rather than recent differentiation of N4-cytosine and N6-adenine DNA methylation , 1999, Silico Biol..

[22]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[23]  Patrice Koehl,et al.  Sequence variations within protein families are linearly related to structural variations. , 2002, Journal of molecular biology.

[24]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[25]  M. Murphy,et al.  Structural comparison of cupredoxin domains: Domain recycling to construct proteins with novel functions , 1997, Protein science : a publication of the Protein Society.

[26]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[27]  Erik Nordling,et al.  Short-chain dehydrogenases/reductases (SDR): the 2002 update. , 2003, Chemico-biological interactions.

[28]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[29]  C. Ponting,et al.  Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. , 1999, Journal of molecular biology.

[30]  A. Mclachlan Gene duplications in the structural evolution of chymotrypsin. , 1979, Journal of molecular biology.

[31]  A. Roger,et al.  Rapid evolution in conformational space: A study of loop regions in a ubiquitous GTP binding domain , 2004, Protein science : a publication of the Protein Society.

[32]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[33]  R. Blumenthal,et al.  Structure-guided analysis reveals nine sequence motifs conserved among DNA amino-methyltransferases, and suggests a catalytic mechanism for these enzymes. , 1995, Journal of molecular biology.

[34]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[35]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[36]  S. Lacks,et al.  Crystal structure of the DpnM DNA adenine methyltransferase from the DpnII restriction system of streptococcus pneumoniae bound to S-adenosylmethionine. , 1998, Structure.

[37]  T. Bhat,et al.  The Protein Data Bank and the challenge of structural genomics , 2000, Nature Structural Biology.

[38]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[39]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[40]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.

[41]  M J Sternberg,et al.  Recognition of analogous and homologous protein folds--assessment of prediction success and associated alignment accuracy using empirical substitution matrices. , 1998, Protein engineering.

[42]  G J Barton,et al.  Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. , 1994, Journal of molecular biology.

[43]  K. Nishikawa,et al.  Protein structure comparison using the Markov transition model of evolution , 2000, Proteins.

[44]  Tony Springall Common Principal Components and Related Multivariate Models , 1991 .

[45]  V. Arcus OB-fold domains: a snapshot of the evolution of sequence, structure and function. , 2002, Current opinion in structural biology.