Dynamics and Adaptive Benefits of Protein Domain Emergence and Arrangements during Plant Genome Evolution

Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants.

[1]  Y. van de Peer,et al.  PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants[W] , 2009, The Plant Cell Online.

[2]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[3]  P. K. Endress Origins of flower morphology. , 2001, The Journal of experimental zoology.

[4]  Guillaume Blanc,et al.  Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes , 2004, The Plant Cell Online.

[5]  A. Kanellis,et al.  Stress and developmental responses of terpenoid biosynthetic genes in Cistus creticus subsp. creticus , 2010, Plant Cell Reports.

[6]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[7]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[8]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[9]  D. Hartl,et al.  Chimeric genes as a source of rapid evolution in Drosophila melanogaster. , 2012, Molecular biology and evolution.

[10]  Brian C. Thomas,et al.  Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. , 2006, Genome research.

[11]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[12]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[13]  Cristian Chaparro,et al.  Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome , 2009, PLoS genetics.

[14]  Sudhir Kumar,et al.  The timetree of life , 2009 .

[15]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[16]  Itay Mayrose,et al.  The frequency of polyploid speciation in vascular plants , 2009, Proceedings of the National Academy of Sciences.

[17]  Ashraf S. Ibrahim,et al.  Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication , 2009, PLoS genetics.

[18]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[19]  Erich Bornberg-Bauer,et al.  The Dynamics and Evolutionary Potential of Domain Loss and Emergence , 2011, Molecular biology and evolution.

[20]  A. Leitch,et al.  Contrasting evolutionary dynamics between angiosperm and mammalian genomes. , 2009, Trends in ecology & evolution.

[21]  J. Olson ‘Evolution of Photosynthesis’ (1970), re-examined thirty years later , 2004, Photosynthesis Research.

[22]  Michaël Bekaert,et al.  Two-Phase Resolution of Polyploidy in the Arabidopsis Metabolic Network Gives Rise to Relative and Absolute Dosage Constraints[W] , 2011, Plant Cell.

[23]  K. Tani,et al.  Asymmetric configurations and N-terminal rearrangements in connexin26 gap junction channels. , 2011, Journal of molecular biology.

[24]  B. Roe,et al.  Estimating genome conservation between crop and model legume species. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  E. Bornberg-Bauer,et al.  Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes , 2011, Nucleic acids research.

[26]  Roger E Bumgarner,et al.  The genome of the domesticated apple (Malus × domestica Borkh.) , 2010, Nature Genetics.

[27]  B. Roe,et al.  Sequencing the Genespaces of Medicago truncatula and Lotus japonicus1 , 2005, Plant Physiology.

[28]  E. Bornberg-Bauer,et al.  The sieve element occlusion gene family in dicotyledonous plants , 2011, Plant signaling & behavior.

[29]  J. Bennetzen,et al.  Natural selection on gene function drives the evolution of LTR retrotransposon families in the rice genome. , 2009, Genome research.

[30]  J. R. Scotti,et al.  Available From , 1973 .

[31]  Haibao Tang,et al.  Insights from the comparison of plant genome sequences. , 2010, Annual review of plant biology.

[32]  Adam Godzik,et al.  Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires , 2011, Genome Biology.

[33]  Wen-Hsiung Li,et al.  External factors accelerate expression divergence between duplicate genes. , 2007, Trends in genetics : TIG.

[34]  Pamela S Soltis,et al.  Phylogeny of seed plants based on evidence from eight genes. , 2002, American journal of botany.

[35]  J. Williams Novelties of the flowering plant pollen tube underlie diversification of a key life history stage , 2008, Proceedings of the National Academy of Sciences.

[36]  Asan,et al.  The genome of the cucumber, Cucumis sativus L. , 2009, Nature Genetics.

[37]  J. Bennetzen,et al.  Transposable elements, gene creation and genome rearrangement in flowering plants. , 2005, Current opinion in genetics & development.

[38]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[39]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[40]  Wendell A. Lim,et al.  Rapid Diversification of Cell Signaling Phenotypes by Modular Domain Recombination , 2010, Science.

[41]  Itay Mayrose,et al.  Recently Formed Polyploid Plants Diversify at Lower Rates Supporting Online Material , 2022 .

[42]  Nicholas H. Putnam,et al.  The Trichoplax genome and the nature of placozoans , 2008, Nature.

[43]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[44]  B. Walsh Population-Genetic Models of the Fates of Duplicate Genes , 2003, Genetica.

[45]  Haibao Tang,et al.  Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. , 2008, Genome research.

[46]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[47]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[48]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[49]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[50]  Vincent Colot,et al.  Understanding mechanisms of novel gene expression in polyploids. , 2003, Trends in genetics : TIG.

[51]  A. Elofsson,et al.  Quantification of the elevated rate of domain rearrangements in metazoa. , 2007, Journal of molecular biology.

[52]  R. Shoemaker,et al.  Paleopolyploidy and gene duplication in soybean and other legumes. , 2006, Current opinion in plant biology.

[53]  Brian Fenton,et al.  Plant responses to insect herbivory: interactions between photosynthesis, reactive oxygen species and hormonal signalling pathways. , 2012, Plant, cell & environment.

[54]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[55]  M. Sanderson,et al.  ANGIOSPERM DIVERGENCE TIMES: THE EFFECT OF GENES, CODON POSITIONS, AND TIME CONSTRAINTS , 2005, Evolution; international journal of organic evolution.

[56]  Joaquín Dopazo,et al.  ETE: a python Environment for Tree Exploration , 2010, BMC Bioinformatics.

[57]  Sarah A Teichmann,et al.  Relative rates of gene fusion and fission in multi-domain proteins. , 2005, Trends in genetics : TIG.

[58]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[59]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[60]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[61]  A. Elofsson,et al.  Domain rearrangements in protein evolution. , 2005, Journal of molecular biology.

[62]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[63]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[64]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[65]  K. Bomblies Doomed lovers: mechanisms of isolation and incompatibility in plants. , 2010, Annual review of plant biology.

[66]  Jeremy D. DeBarry,et al.  De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera) , 2011, Nature Biotechnology.

[67]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[68]  B. Haas,et al.  Draft genome sequence of the oilseed species Ricinus communis , 2010, Nature Biotechnology.

[69]  A. Kawakita,et al.  Repeated independent evolution of obligate pollination mutualism in the Phyllantheae–Epicephala association , 2009, Proceedings of the Royal Society B: Biological Sciences.

[70]  Andrew D. Moore,et al.  Just how versatile are domains? , 2008, BMC Evolutionary Biology.

[71]  S. Brenner,et al.  Investigation of loss and gain of introns in the compact genomes of pufferfishes (Fugu and Tetraodon). , 2008, Molecular biology and evolution.

[72]  Michael S. Barker,et al.  The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of Vascular Plants , 2011, Science.

[73]  David C. Tank,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: , 2009 .

[74]  Pierre Brézellec,et al.  Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins , 2006, Bioinform..

[75]  A. Collins,et al.  Fossils and phylogenies: integrating multiple lines of evidence to investigate the origin of early major metazoan lineages. , 2007, Integrative and comparative biology.

[76]  T. Tschaplinski,et al.  Genome-wide Identification of Lineage-specific Genes in Arabidopsis, Oryza and Populus , 2022 .

[77]  A. Salamov,et al.  Green Evolution and Dynamic Adaptations Revealed by Genomes of the Marine Picoeukaryotes Micromonas , 2009, Science.

[78]  Michael Ashburner,et al.  On ontologies for biologists: the Gene Ontology--untangling the web. , 2002, Novartis Foundation symposium.

[79]  B. Mueller‐Roeber,et al.  Genome-Wide Phylogenetic Comparative Analysis of Plant Transcriptional Regulation: A Timeline of Loss, Gain, Expansion, and Correlation with Complexity , 2010, Genome biology and evolution.

[80]  A. Bateman,et al.  The evolution of protein domain families. , 2009, Biochemical Society transactions.

[81]  Alex Bateman,et al.  Quantifying the mechanisms of domain gain in animal proteins , 2010, Genome Biology.

[82]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[83]  Roded Sharan,et al.  Evolution of domain promiscuity in eukaryotic genomes--a perspective from the inferred ancestral domain architectures. , 2011, Molecular bioSystems.

[84]  T. Vision,et al.  Divergence in expression between duplicated genes in Arabidopsis. , 2007, Molecular biology and evolution.

[85]  B. Logan,et al.  Energy dissipation and radical scavenging by the plant phenylpropanoid pathway. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[86]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[87]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[88]  P. K. Wall,et al.  Evolution of plant MADS box transcription factors: evidence for shifts in selection associated with early angiosperm diversification and concerted gene duplications. , 2009, Molecular biology and evolution.

[89]  J. Bennetzen,et al.  The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants , 2008, Science.

[90]  E. Bornberg-Bauer,et al.  Domain deletions and substitutions in the modular protein evolution , 2006, The FEBS journal.

[91]  Junhua Peng,et al.  The organization and rate of evolution of wheat genomes are correlated with recombination rates along chromosome arms. , 2003, Genome research.

[92]  Steven Maere,et al.  Genome duplication and the origin of angiosperms. , 2005, Trends in ecology & evolution.

[93]  E. Bornberg-Bauer,et al.  Molecular and phylogenetic characterization of the sieve element occlusion gene family in Fabaceae and non-Fabaceae plants , 2010, BMC Plant Biology.

[94]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[95]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[96]  H. Gaffron Evolution of photosynthesis. , 1962, Comparative biochemistry and physiology.

[97]  E. Sonnhammer,et al.  Domain tree-based analysis of protein architecture evolution. , 2008, Molecular biology and evolution.

[98]  Nicholas H. Putnam,et al.  The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation , 2007, Proceedings of the National Academy of Sciences.

[99]  J. Kroymann Natural diversity and adaptation in plant secondary metabolism. , 2011, Current opinion in plant biology.

[100]  E. Ostertag,et al.  Current topics in genome evolution: Molecular mechanisms of new gene formation , 2007, Cellular and Molecular Life Sciences.

[101]  Ziheng Yang,et al.  The Timetree of Life , 2010 .

[102]  D. Penny,et al.  Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. , 2006, Molecular biology and evolution.

[103]  E. Bornberg-Bauer,et al.  Phylogenetic profiling of protein interaction networks in eukaryotic transcription factors reveals focal proteins being ancestral to hubs. , 2005, Gene.

[104]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[105]  Erich Bornberg-Bauer,et al.  Evidence of interaction network evolution by whole-genome duplications: a case study in MADS-box proteins. , 2006, Molecular biology and evolution.

[106]  Jessica H. Fong,et al.  Modeling the evolution of protein domain architectures using maximum parsimony. , 2007, Journal of molecular biology.

[107]  A. Meyer,et al.  The evolutionary significance of ancient genome duplications , 2009, Nature Reviews Genetics.

[108]  Sara L. Zimmer,et al.  The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions , 2007, Science.

[109]  J. Lundberg,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants : APG II THE ANGIOSPERM PHYLOGENY GROUP * , 2003 .

[110]  Robert Eugene Blankenship,et al.  Evolution of photosynthesis. , 2011, Annual review of plant biology.

[111]  J. Poulain,et al.  The genome of Theobroma cacao , 2011, Nature Genetics.

[112]  R. Michod,et al.  Triassic origin and early radiation of multicellular volvocine algae , 2009, Proceedings of the National Academy of Sciences.

[113]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[114]  David Sankoff,et al.  Gene Loss under Neighborhood Selection Following Whole genome Duplication and the Reconstruction of the Ancestral Populus genome , 2009, J. Bioinform. Comput. Biol..

[115]  Melissa D. Lehti-Shiu,et al.  Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli1[W][OA] , 2008, Plant Physiology.

[116]  Sai Guna Ranjan Gurazada,et al.  Genome sequencing and analysis of the model grass Brachypodium distachyon , 2010, Nature.