PhyloSort: a user-friendly phylogenetic sorting tool and its application to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas

BackgroundPhylogenomic pipelines generate a large collection of phylogenetic trees that require manual inspection to answer questions about gene or genome evolution. A notable application of phylogenomics is to photosynthetic organelle (plastid) endosymbiosis. In the case of primary endosymbiosis, a heterotrophic protist engulfed a cyanobacterium, giving rise to the first photosynthetic eukaryote. Plastid establishment precipitated extensive gene transfer from the endosymbiont to the nuclear genome of the 'host'. Estimating the magnitude of this endosymbiotic gene transfer (EGT) and determining the functions of the prokaryotic genes remain controversial issues. We used phylogenomics to study EGT in the model green alga Chlamydomonas reinhardtii. To facilitate this procedure, we developed PhyloSort to rapidly search large collection of trees for monophyletic relationships. Here we present PhyloSort and its application to estimating EGT in Chlamydomonas.ResultsPhyloSort is an open-source tool to sort phylogenetic trees by searching for user specified subtrees that contain a monophyletic group of interest defined by operational taxonomic units in a phylogenomic context. Using PhyloSort, we identified 897 Chlamydomonas genes of putative cyanobacterial origin, of which 531 had bootstrap support values ≥ 50% for the grouping of the algal and cyanobacterial homologs.ConclusionPhyloSort can be applied to quantify the number of genes that support different evolutionary hypotheses such as a taxonomic classification or endosymbiotic or horizontal gene transfer events. In our application, we demonstrate that cyanobacteria account for 3.5–6% of the protein-coding genes in the nuclear genome of Chlamydomonas.

[1]  Fumiko Ohta,et al.  Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D , 2004, Nature.

[2]  Debashish Bhattacharya,et al.  Phylogenomic analysis identifies red algal genes of endosymbiotic origin in the chromalveolates. , 2006, Molecular biology and evolution.

[3]  Jan-Fang Cheng,et al.  Chimeric plastid proteome in the Florida "red tide" dinoflagellate Karenia brevis. , 2006, Molecular biology and evolution.

[4]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[5]  Uta Bohnebeck,et al.  PhyloGena - a user-friendly system for automated phylogenetic annotation of unknown sequences , 2007, Bioinform..

[6]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[7]  Sabine Cornelsen,et al.  Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Debashish Bhattacharya,et al.  A molecular timeline for the origin of photosynthetic eukaryotes. , 2004, Molecular biology and evolution.

[9]  R. Haselkorn,et al.  The hglK gene is required for localization of heterocyst-specific glycolipids in the cyanobacterium Anabaena sp. strain PCC 7120 , 1995, Journal of bacteriology.

[10]  A. Weber,et al.  The origin and establishment of the plastid in algae and plants. , 2007, Annual review of genetics.

[11]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[12]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[13]  M. Ishikawa,et al.  Mass identification of chloroplast proteins of endosymbiont origin by phylogenetic profiling based on organism-optimized homologous protein groups. , 2005, Genome informatics. International Conference on Genome Informatics.

[14]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[15]  Andrei N Lupas,et al.  PhyloGenie: automated phylome generation and analysis. , 2004, Nucleic acids research.

[16]  Debashish Bhattacharya,et al.  Cyanobacterial Contribution to Algal Nuclear Genomes Is Primarily Limited to Plastid Functions , 2006, Current Biology.

[17]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[18]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[19]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  S. Adl,et al.  The New Higher Level Classification of Eukaryotes with Emphasis on the Taxonomy of Protists , 2005, The Journal of eukaryotic microbiology.