Emergence of novel domains in proteins

BackgroundProteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve.ResultsTo gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains.ConclusionsWe conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.

[1]  Dmitri A. Petrov,et al.  Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes , 2010, Genome biology and evolution.

[2]  L. Armengol,et al.  Origin of primate orphan genes: a comparative genomics approach. , 2008, Molecular biology and evolution.

[3]  Sarah A. Teichmann,et al.  An insight into domain combinations , 2001, ISMB.

[4]  A. Bateman,et al.  The evolution of protein domain families. , 2009, Biochemical Society transactions.

[5]  H. Thiesen,et al.  Krüppel-associated boxes are potent transcriptional repression domains. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Chittibabu Guda,et al.  Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level , 2006, BMC Evolutionary Biology.

[7]  D. Tautz,et al.  An evolutionary analysis of orphan genes in Drosophila. , 2003, Genome research.

[8]  Alex Bateman,et al.  Quantifying the mechanisms of domain gain in animal proteins , 2010, Genome Biology.

[9]  Katherine S. Pollard,et al.  ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin , 2012, PLoS Comput. Biol..

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  E. Bornberg-Bauer,et al.  Domain deletions and substitutions in the modular protein evolution , 2006, The FEBS journal.

[12]  A. Elofsson,et al.  Quantification of the elevated rate of domain rearrangements in metazoa. , 2007, Journal of molecular biology.

[13]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[14]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[15]  D. Tautz,et al.  The evolutionary origin of orphan genes , 2011, Nature Reviews Genetics.

[16]  Macarena Toll-Riera,et al.  Role of low-complexity sequences in the formation of novel protein coding sequences. , 2012, Molecular biology and evolution.

[17]  P. van de Putte,et al.  Molecular characterization and evolution of the SPRR family of keratinocyte differentiation markers encoding small proline-rich proteins. , 1993, Genomics.

[18]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[19]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[20]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[21]  R. Emerson,et al.  Gypsy and the Birth of the SCAN Domain , 2011, Journal of Virology.

[22]  S. Blacklow,et al.  The Zinc Finger-Associated SCAN Box Is a Conserved Oligomerization Domain , 1999, Molecular and Cellular Biology.

[23]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[24]  Jessica H. Fong,et al.  Modeling the evolution of protein domain architectures using maximum parsimony. , 2007, Journal of molecular biology.

[25]  M. Mar Albà,et al.  Clustering of Genes Coding for DNA Binding Proteins in a Regionof Atypical Evolution of the Human Genome , 2004, Journal of Molecular Evolution.

[26]  H. Ochman,et al.  Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. , 2004, Genome research.

[27]  Erich Bornberg-Bauer,et al.  The Dynamics and Evolutionary Potential of Domain Loss and Emergence , 2011, Molecular biology and evolution.

[28]  M. Sternberg,et al.  Structural characterization of the human proteome. , 2002, Genome research.

[29]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[30]  M. Albà,et al.  Sequence shortening in the rodent ancestor. , 2012, Genome research.

[31]  Sarah A Teichmann,et al.  How do proteins gain new domains? , 2010, Genome Biology.

[32]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[33]  David K. Smith,et al.  Accelerated Evolutionary Rate May Be Responsible for the Emergence of Lineage-Specific Genes in Ascomycota , 2006, Journal of Molecular Evolution.

[34]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[35]  Mona Singh,et al.  Novel genes exhibit distinct patterns of function acquisition and network integration , 2010, Genome Biology.

[36]  E. Trifonov,et al.  Origin and evolution of genes and genomes. Crucial role of triplet expansions , 2012, Journal of biomolecular structure & dynamics.

[37]  M. Albà,et al.  Inverse relationship between evolutionary rate and age of mammalian genes. , 2005, Molecular biology and evolution.

[38]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[39]  A. Elofsson,et al.  Domain rearrangements in protein evolution. , 2005, Journal of molecular biology.

[40]  J. Chien,et al.  TCEAL7, a putative tumor suppressor gene, negatively regulates NF-κB pathway , 2010, Oncogene.

[41]  N. Sakabe,et al.  Signs of Ancient and Modern Exon-Shuffling Are Correlated to the Distribution of Ancient and Modern Domains Along Proteins , 2005, Journal of Molecular Evolution.