Exploration of phylogenetic data using a global sequence analysis method

BackgroundMolecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets.ResultsHere we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers.ConclusionThe signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis.

[1]  J. Manhart Chloroplast 16S rDNA Sequences and Phylogenetic Relationships of Fern Allies and Ferns , 1995 .

[2]  B. Billoud,et al.  Cirripede phylogeny using a novel approach: molecular morphometrics. , 2000, Molecular biology and evolution.

[3]  Peter R. Crane,et al.  The origin and early evolution of plants on land , 1997, Nature.

[4]  W. Pearson,et al.  Current Protocols in Bioinformatics , 2002 .

[5]  Olivier Gascuel,et al.  Getting a tree fast: Neighbor Joining, FastME, and distance-based methods. , 2003, Current protocols in bioinformatics.

[6]  Walter M. Fitch,et al.  On the Problem of Discovering the Most Parsimonious Tree , 1977, The American Naturalist.

[7]  J. Doyle PHYLOGENY OF VASCULAR PLANTS , 1998 .

[8]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[9]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[10]  J. Doyle Seed Plant Phylogeny and the Relationships of Gnetales , 1996, International Journal of Plant Sciences.

[11]  M. Donoghue,et al.  Integration of morphological and ribosomal RNA data on the origin of angiosperms , 1994 .

[12]  S. Karlin Bacterial DNA strand compositional asymmetry. , 1999, Trends in microbiology.

[13]  Jon E. Ahlquist,et al.  Phylogeny and Classification of the Birds: A Study in Molecular Evolution , 1991 .

[14]  L. Brocchieri,et al.  Phylogenetic inferences from molecular sequences: review and critique. , 2001, Theoretical population biology.

[15]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[16]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[17]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[18]  Takeshi Itoh,et al.  Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J A Lake,et al.  Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. , 1992, Science.

[20]  J. Palmer,et al.  Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[22]  O. Gascuel,et al.  Quartet-based phylogenetic inference: improvements and limits. , 2001, Molecular biology and evolution.

[23]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[24]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[25]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[26]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[27]  Alain Guénoche,et al.  Can We Have Confidence in a Tree Representation? , 2000, JOBIM.

[28]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[29]  Zu-Guo Yu,et al.  Distance, correlation and mutual information among portraits of organisms based on complete genomes , 2001 .

[30]  References , 1971 .

[31]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[32]  N. Moran,et al.  The process of genome shrinkage in the obligate symbiont Buchnera aphidicola , 2001, Genome Biology.

[33]  N Takezaki,et al.  Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. , 1996, Molecular biology and evolution.

[34]  J. Doyle Molecules, morphology, fossils, and the relationship of angiosperms and Gnetales. , 1998, Molecular phylogenetics and evolution.

[35]  B. Snel,et al.  SHOT: a web server for the construction of genome phylogenies. , 2002, Trends in genetics : TIG.

[36]  Santiago Garcia-Vallvé,et al.  HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes , 2003, Nucleic Acids Res..

[37]  Martin Vingron,et al.  The SYSTERS protein sequence cluster set , 2000, Nucleic Acids Res..

[38]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[39]  J. Manhart Phylogenetic analysis of green plant rbcL sequences. , 1994, Molecular phylogenetics and evolution.

[40]  Peter R. Crane,et al.  Phylogenetic analysis of seed plants and the origin of angiosperms , 1985 .

[41]  Alain Giron,et al.  A genomic schism in birds revealed by phylogenetic analysis of DNA strings. , 2002, Systematic biology.

[42]  B. Dujon,et al.  The genomic tree as revealed from whole proteome comparisons. , 1999, Genome research.

[43]  D. Soltis,et al.  The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal? , 1999, Molecular biology and evolution.

[44]  M. Gouy,et al.  A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. , 2002, Genome research.

[45]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[46]  N. Grishin,et al.  Genome trees constructed using five different approaches suggest new major bacterial clades , 2001, BMC Evolutionary Biology.

[47]  Alain Giron,et al.  Genomic signature is preserved in short DNA fragments , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[48]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[49]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[50]  L. Steiner,et al.  Recombination activating gene 1 (Rag1) in zebrafish and shark , 2004, Immunogenetics.

[51]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[52]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[53]  J. Farris,et al.  Simultaneous parsimony jackknife analysis of 2538rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants , 1998, Plant Systematics and Evolution.

[54]  E. Holmes,et al.  Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Olivier Gascuel,et al.  Getting a Tree Fast: Neighbor Joining and Distance‐Based Methods , 2002 .

[56]  Lila Kari,et al.  The spectrum of genomic signatures: from dinucleotides to chaos game representation. , 2005, Gene.

[57]  B. Lang,et al.  Mitochondrial evolution. , 1999, Science.

[58]  K. Lau,et al.  Measure representation and multifractal analysis of complete genomes. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[60]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[61]  Alain Giron,et al.  Detection and characterization of horizontal transfers in prokaryotes using genomic signature , 2005, Nucleic acids research.

[62]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.

[63]  S. Lanyon,et al.  DETECTING INTERNAL INCONSISTENCIES IN DISTANCE DATA , 1985 .

[64]  Guillaume Lecointre,et al.  Classification phylogénétique du vivant , 2001 .

[65]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[66]  Radhey S. Gupta Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes , 1998, Microbiology and Molecular Biology Reviews.