Detection of Fused Genes in Eukaryotic Genomes using Gene deFuser: Analysis of the Tetrahymena thermophila genome

BackgroundFused genes are important sources of data for studies of evolution and protein function. To date no service has been made available online to aid in the large-scale identification of fused genes in sequenced genomes. We have developed a program, Gene deFuser, that analyzes uploaded protein sequence files for characteristics of gene fusion events and presents the results in a convenient web interface.ResultsTo test the ability of this software to detect fusions on a genome-wide scale, we analyzed the 24,725 gene models predicted for the ciliated protozoan Tetrahymena thermophila. Gene deFuser detected members of eight of the nine families of gene fusions known or predicted in this species and identified nineteen new families of fused genes, each containing between one and twelve members. In addition to these genuine fusions, Gene deFuser also detected a particular type of gene misannotation, in which two independent genes were predicted as a single transcript by gene annotation tools. Twenty-nine of the artifacts detected by Gene deFuser in the initial annotation have been corrected in subsequent versions, with a total of 25 annotation artifacts (about 1/3 of the total fusions identified) remaining in the most recent annotation.ConclusionsThe newly identified Tetrahymena fusions belong to classes of genes involved in processes such as phospholipid synthesis, nuclear export, and surface antigen generation. These results highlight the potential of Gene deFuser to reveal a large number of novel fused genes in evolutionarily isolated organisms. Gene deFuser may also prove useful as an ancillary tool for detecting fusion artifacts during gene model annotation.

[1]  R. Veitia,et al.  Rosetta Stone proteins: "chance and necessity"? , 2002, Genome Biology.

[2]  Jonathan A Eisen,et al.  Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure , 2008, BMC Genomics.

[3]  T. Cavalier-smith,et al.  The root of the eukaryote tree pinpointed , 2003, Current Biology.

[4]  E. Koonin,et al.  Evolution of gene fusions: horizontal transfer versus independent events , 2002, Genome Biology.

[5]  Masami Hasegawa,et al.  Root of the Eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. , 2005, Molecular biology and evolution.

[6]  J. L. Rosa,et al.  The RCC1 superfamily: from genes, to function, to disease. , 2008, Biochimica et biophysica acta.

[7]  Andre R. O. Cavalcanti,et al.  1+1 = 3: A Fusion of 2 Enzymes in the Methionine Salvage Pathway of Tetrahymena thermophila Creates a Trifunctional Enzyme That Catalyzes 3 Steps in the Pathway , 2009, PLoS genetics.

[8]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[9]  A. Andreeva,et al.  Protein Ser/Thr phosphatases with kelch-like repeat domains. , 2002, Cellular signalling.

[10]  J. Cherfils,et al.  The domain architecture of large guanine nucleotide exchange factors for the small GTP-binding protein Arf , 2005, BMC Genomics.

[11]  G. von Samson-Himmelstjerna,et al.  In silico analysis of the cyclophilin repertoire of apicomplexan parasites , 2009, Parasites & Vectors.

[12]  Anton J. Enright,et al.  Denoising inferred functional association networks obtained by gene fusion analysis , 2007, BMC Genomics.

[13]  W. Martin,et al.  Eukaryotic evolution, changes and challenges , 2006, Nature.

[14]  B. Fahrenkrog,et al.  Nuclear myosin 1 is in complex with mature rRNA transcripts and associates with the nuclear pore basket , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[15]  B. Reiss,et al.  Cell type-specific gene expression in the cell cycle of the dimorphic ciliate Eufolliculina uhligi , 1999, Molecular and General Genetics MGG.

[16]  Andre R. O. Cavalcanti,et al.  Reciprocal fusions of two genes in the formaldehyde detoxification pathway in ciliates and diatoms. , 2005, Molecular biology and evolution.

[17]  C. Yanofsky,et al.  Gene fusion during the evolution of the tryptophan operon in Enterobacteriaceae , 1979, Nature.

[18]  Christopher J. Lee,et al.  A transcriptional sketch of a primary human breast cancer by 454 deep sequencing , 2009, BMC Genomics.

[19]  Shuai Weng,et al.  Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research , 2005, Nucleic Acids Res..

[20]  N. Hirokawa,et al.  Kinesin and dynein superfamily proteins and the mechanism of organelle transport. , 1998, Science.

[21]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[22]  J. Condra,et al.  A Proprotein Convertase Subtilisin-like/Kexin Type 9 (PCSK9) C-terminal Domain Antibody Antigen-binding Fragment Inhibits PCSK9 Internalization and Restores Low Density Lipoprotein Uptake , 2010, The Journal of Biological Chemistry.

[23]  Ana I. Caño-Delgado,et al.  Nuclear protein phosphatases with Kelch-repeat domains modulate the response to brassinosteroids in Arabidopsis. , 2004, Genes & development.

[24]  Yixian Zheng,et al.  Ran in the spindle checkpoint: a new function for a versatile GTPase. , 2003, Trends in cell biology.

[25]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[26]  H. Plattner,et al.  Guanylyl cyclases with the topology of mammalian adenylyl cyclases and an N‐terminal P‐type ATPase‐like domain in Paramecium, Tetrahymena and Plasmodium , 1999, The EMBO journal.

[27]  Jacques Monod,et al.  Chance and Necessity , 1970 .

[28]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[29]  T. Hunter,et al.  Dual-specificity protein kinases: will any hydroxyl do? , 1992, Trends in biochemical sciences.

[30]  Matthew S Macauley,et al.  Drosophila O-GlcNAc transferase (OGT) is encoded by the Polycomb group (PcG) gene, super sex combs (sxc) , 2009, Proceedings of the National Academy of Sciences.

[31]  William H. Majoros,et al.  Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote , 2006, PLoS biology.

[32]  Yi Zhou,et al.  BLASTO: a tool for searching orthologous groups , 2007, Nucleic Acids Res..

[33]  I. Coppens,et al.  Role of an ancestral d-bifunctional protein containing two sterol-carrier protein-2 domains in lipid uptake and trafficking in Toxoplasma. , 2008, Molecular biology of the cell.

[34]  T. Cavalier-smith,et al.  Rooting the Eukaryote Tree by Using a Derived Gene Fusion , 2002, Science.

[35]  B. Snel,et al.  Genome evolution. Gene fusion versus gene fission. , 2000, Trends in Genetics.

[36]  A. Kong,et al.  Modulation of nuclear factor E2-related factor 2–mediated gene expression in mice liver and small intestine by cancer chemopreventive agent curcumin , 2006, Molecular Cancer Therapeutics.

[37]  S. Baldauf,et al.  The Deep Roots of Eukaryotes , 2003, Science.

[38]  Sarah A Teichmann,et al.  Relative rates of gene fusion and fission in multi-domain proteins. , 2005, Trends in genetics : TIG.