Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.

[1]  T. Lingham‐Soliar,et al.  Origin and evolution , 2014 .

[2]  Oliver Eulenstein,et al.  The Plexus Model for the Inference of Ancestral Multidomain Proteins , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Lawrence A. David,et al.  Rapid evolutionary innovation during an Archaean genetic expansion , 2011, Nature.

[4]  Manolis Kellis,et al.  A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction , 2010, Molecular biology and evolution.

[5]  D. Hartl,et al.  Adaptive impact of the chimeric gene Quetzalcoatl in Drosophila melanogaster , 2010, Proceedings of the National Academy of Sciences.

[6]  Wendell A. Lim,et al.  Rapid Diversification of Cell Signaling Phenotypes by Modular Domain Recombination , 2010, Science.

[7]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[8]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[9]  B. Lee Comparison of Exon-boundary Old and Young Domains during Metazoan Evolution , 2009 .

[10]  Manuel A. S. Santos,et al.  Evolution of pathogenicity and sexual reproduction in eight Candida genomes , 2009, Nature.

[11]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[12]  D. Hartl,et al.  Formation and Longevity of Chimeric and Duplicate Genes in Drosophila melanogaster , 2009, Genetics.

[13]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[14]  Kousha Etessami,et al.  Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations , 2005, JACM.

[15]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[16]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[17]  Guozhen Liu,et al.  DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions , 2008, BMC Genomics.

[18]  Corbin D. Jones,et al.  Patterns of Amino Acid Evolution in the Drosophila ananassae Chimeric Gene, siren, Parallel Those of Other Adh-Derived Chimeras , 2008, Genetics.

[19]  David James Sherman,et al.  Fusion and Fission of Genes Define a Metric between Fungal Genomes , 2008, PLoS Comput. Biol..

[20]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[21]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[22]  Dannie Durand,et al.  Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins , 2008, PLoS Comput. Biol..

[23]  E. Sonnhammer,et al.  Domain tree-based analysis of protein architecture evolution. , 2008, Molecular biology and evolution.

[24]  Matthew D. Rasmussen,et al.  Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. , 2007, Genome research.

[25]  Jeffery P. Demuth,et al.  Accelerated Rate of Gene Gain and Loss in Primates , 2007, Genetics.

[26]  Mira V. Han,et al.  Gene Family Evolution across 12 Drosophila Genomes , 2007, PLoS genetics.

[27]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[28]  M. Nei,et al.  Extensive Gains and Losses of Olfactory Receptor Genes in Mammalian Evolution , 2007, PloS one.

[29]  Jessica H. Fong,et al.  Modeling the evolution of protein domain architectures using maximum parsimony. , 2007, Journal of molecular biology.

[30]  Martin Vingron,et al.  Reconstructing Domain Compositions of Ancestral Multi-domain Proteins , 2006, Comparative Genomics.

[31]  Jun Wang,et al.  High Rate of Chimeric Gene Origination by Retroposition in Plant Genomes[W] , 2006, The Plant Cell Online.

[32]  Pierre Brézellec,et al.  Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins , 2006, Bioinform..

[33]  W. Lim,et al.  Domains, motifs, and scaffolds: the role of modular interactions in the evolution and wiring of cell signaling circuits. , 2006, Annual review of biochemistry.

[34]  E. Bornberg-Bauer,et al.  Domain deletions and substitutions in the modular protein evolution , 2006, The FEBS journal.

[35]  E. Bornberg-Bauer,et al.  Evolution of circular permutations in multidomain proteins. , 2006, Molecular biology and evolution.

[36]  I. Uchiyama Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes , 2006, Nucleic acids research.

[37]  R. Sorek,et al.  Transcription-mediated gene fusion in the human genome. , 2005, Genome research.

[38]  S. Chin,et al.  Human and mouse oligonucleotide-based array CGH , 2005, Nucleic acids research.

[39]  Corbin D. Jones,et al.  Parallel evolution of chimeric fusion genes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Matthew W. Hahn,et al.  Estimating the tempo and mode of gene family evolution from comparative genomic data. , 2005, Genome research.

[41]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[42]  Dannie Durand,et al.  Graph Theoretical Insights into Evolution of Multidomain Proteins , 2005, RECOMB.

[43]  Julian Gough,et al.  Convergent evolution of domain architectures (is rare) , 2005, Bioinform..

[44]  Peter Boyle,et al.  Origin and Evolution , 2005 .

[45]  John B. Anderson,et al.  CDD: a Conserved Domain Database for protein classification , 2004, Nucleic Acids Res..

[46]  Sarah A Teichmann,et al.  Relative rates of gene fusion and fission in multi-domain proteins. , 2005, Trends in genetics : TIG.

[47]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[48]  S. Teichmann,et al.  The evolution of domain arrangements in proteins and interaction networks , 2005, Cellular and Molecular Life Sciences CMLS.

[49]  A. Grigoriev,et al.  Protein domains correlate strongly with exons in multiple eukaryotic genomes--evidence of exon shuffling? , 2004, Trends in genetics : TIG.

[50]  Manyuan Long,et al.  Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species , 2004, Nature Genetics.

[51]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[52]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[53]  Sudhir Kumar,et al.  Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. , 2003, Molecular biology and evolution.

[54]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[55]  Jean-Michel Claverie,et al.  FusionDB: a database for in-depth analysis of prokaryotic gene fusion events , 2004, Nucleic Acids Res..

[56]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[57]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[58]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[59]  Joakim Nivre AN EFFICIENT ALGORITHM , 2003 .

[60]  Anton Nekrutenko,et al.  Signatures of domain shuffling in the human genome. , 2002, Genome research.

[61]  Sean R. Eddy,et al.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs , 2002, BMC Bioinformatics.

[62]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[63]  E. Nevo,et al.  Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[64]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[65]  E. Koonin,et al.  Evolution of gene fusions: horizontal transfer versus independent events , 2002, Genome Biology.

[66]  M. Long,et al.  Evolution of novel genes. , 2001, Current opinion in genetics & development.

[67]  Sean R. Eddy,et al.  A simple algorithm to infer gene duplication and speciation events on a gene tree , 2001, Bioinform..

[68]  Anton J. Enright,et al.  Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions , 2001, Genome Biology.

[69]  Sarah A. Teichmann,et al.  An insight into domain combinations , 2001, ISMB.

[70]  J. Nahon,et al.  Birth of Two Chimeric Genes in the Hominidae Lineage , 2001, Science.

[71]  P. Schimmel Reflections on the 20^ Taniguchi International Symposium - Tracing Biological Evolution in Protein and Gene Structures , 2001 .

[72]  Pedro Mendes,et al.  ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources , 2001, Bioinform..

[73]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[74]  R. Guigó,et al.  Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene. , 2000, Genome research.

[75]  M. Long,et al.  The origin of the Jingwei gene and the complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster. , 2000, Molecular biology and evolution.

[76]  B. Snel,et al.  Genome evolution. Gene fusion versus gene fission. , 2000, Trends in Genetics.

[77]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[78]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[79]  M. Long,et al.  Origin of new genes and source for N-terminal domain of the chimerical gene, jingwei, in Drosophila. , 1999, Gene.

[80]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[81]  R. Tjian,et al.  TATA box-binding protein (TBP)-related factor 2 (TRF2), a third member of the TBP family. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[82]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[83]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[84]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[85]  P. Schimmel,et al.  Tracing Biological Evolution in Protein and Gene Structures: Proceedings of the 20th Taniguchi International Symposium, Division of Biophysics, Held in Nagoya, Japan, 31 October-4 November 1994 , 1996 .

[86]  W. Gilbert,et al.  Intron phase correlations and the evolution of the intron/exon structure of genes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[87]  H. Xiao,et al.  Recruiting TATA-binding protein to a promoter: transcriptional activation without an upstream activator , 1995, Molecular and cellular biology.

[88]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[89]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[90]  M. Muramatsu,et al.  Structure of a mammalian TBP (TATA-binding protein) gene: isolation of the mouse TBP genome. , 1993, Nucleic acids research.

[91]  M. Long,et al.  Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. , 1993, Science.

[92]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[93]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[94]  L. Patthy,et al.  Intron‐dependent evolution: Preferred types of exons and introns , 1987, FEBS letters.

[95]  N. Saito The neighbor-joining method : A new method for reconstructing phylogenetic trees , 1987 .

[96]  Benjamin Friedlander,et al.  An efficient algorithm , 1983 .

[97]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[98]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.