Domain architecture conservation in orthologs

BackgroundAs orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence.To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.ResultsThe analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation.The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.ConclusionsOn the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

[1]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[2]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[3]  Andrey Alexeyenko,et al.  Overview and comparison of ortholog databases. , 2006, Drug discovery today. Technologies.

[4]  A. Bateman,et al.  The evolution of protein domain families. , 2009, Biochemical Society transactions.

[5]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[6]  E. Koonin,et al.  Orthology, paralogy and proposed classification for paralog subtypes. , 2002, Trends in genetics : TIG.

[7]  S. Jeffery Evolution of Protein Molecules , 1979 .

[8]  Erik L. L. Sonnhammer,et al.  Domain architecture conservation in orthologs , 2011 .

[9]  Albert J. Vilella,et al.  Joining forces in the quest for orthologs , 2009, Genome Biology.

[10]  E. Koonin,et al.  Selection in the evolution of gene duplications , 2002, Genome Biology.

[11]  A. Sali,et al.  Evolutionary constraints on structural similarity in orthologs and paralogs , 2009, Protein science : a publication of the Protein Society.

[12]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[13]  Sean R. Eddy,et al.  A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation , 2008, PLoS Comput. Biol..

[14]  Dannie Durand,et al.  Domain Architecture Comparison for Multidomain Homology Identification , 2007, J. Comput. Biol..

[15]  S. Teichmann,et al.  The relationship between domain duplication and recombination. , 2005, Journal of molecular biology.

[16]  Anton J. Enright,et al.  COmplete GENome Tracking (COGENT): A Flexible Data Environment for Computational Genomics , 2003, Bioinform..

[17]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[18]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[19]  Paramvir S. Dehal,et al.  Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate , 2005, PLoS biology.

[20]  M. Robinson‐Rechavi,et al.  How confident can we be that orthologs are similar, but paralogs differ? , 2009, Trends in genetics : TIG.

[21]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[22]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[23]  BMC Bioinformatics , 2005 .

[24]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[25]  Lei Zhu,et al.  An initial strategy for comparing proteins at the domain architecture level , 2006, Bioinform..

[26]  N Turner,et al.  Chi-squared test. , 2000, Journal of clinical nursing.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[29]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[30]  Poethig Rs,et al.  Life with 25,000 genes. , 2001 .

[31]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[32]  E. Sonnhammer,et al.  Domain tree-based analysis of protein architecture evolution. , 2008, Molecular biology and evolution.

[33]  A. Elofsson,et al.  Domain rearrangements in protein evolution. , 2005, Journal of molecular biology.

[34]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[35]  Erik L. L. Sonnhammer,et al.  InParanoid 6: eukaryotic ortholog clusters with inparalogs , 2007, Nucleic Acids Res..

[36]  D. Botstein,et al.  Orthology and functional conservation in eukaryotes. , 2007, Annual review of genetics.

[37]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[38]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..

[39]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[40]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[41]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[42]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[43]  A. Force,et al.  The probability of duplicate gene preservation by subfunctionalization. , 2000, Genetics.