A Phylogenetic Analysis of the Brassicales Clade Based on an Alignment-Free Sequence Comparison Method

Phylogenetic analyses reveal the evolutionary derivation of species. A phylogenetic tree can be inferred from multiple sequence alignments of proteins or genes. The alignment of whole genome sequences of higher eukaryotes is a computational intensive and ambitious task as is the computation of phylogenetic trees based on these alignments. To overcome these limitations, we here used an alignment-free method to compare genomes of the Brassicales clade. For each nucleotide sequence a Chaos Game Representation (CGR) can be computed, which represents each nucleotide of the sequence as a point in a square defined by the four nucleotides as vertices. Each CGR is therefore a unique fingerprint of the underlying sequence. If the CGRs are divided by grid lines each grid square denotes the occurrence of oligonucleotides of a specific length in the sequence (Frequency Chaos Game Representation, FCGR). Here, we used distance measures between FCGRs to infer phylogenetic trees of Brassicales species. Three types of data were analyzed because of their different characteristics: (A) Whole genome assemblies as far as available for species belonging to the Malvidae taxon. (B) EST data of species of the Brassicales clade. (C) Mitochondrial genomes of the Rosids branch, a supergroup of the Malvidae. The trees reconstructed based on the Euclidean distance method are in general agreement with single gene trees. The Fitch–Margoliash and Neighbor joining algorithms resulted in similar to identical trees. Here, for the first time we have applied the bootstrap re-sampling concept to trees based on FCGRs to determine the support of the branchings. FCGRs have the advantage that they are fast to calculate, and can be used as additional information to alignment based data and morphological characteristics to improve the phylogenetic classification of species in ambiguous cases.

[1]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[2]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[3]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[4]  E Fleck,et al.  Representation of amino acid sequences as two‐dimensional point patterns , 1997, Electrophoresis.

[5]  S. Basu,et al.  Chaos game representation of proteins. , 1997, Journal of molecular graphics & modelling.

[6]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[7]  S. Karlin,et al.  Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[9]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[10]  Jonas S. Almeida,et al.  Analysis of genomic sequences by Chaos Game Representation , 2001, Bioinform..

[11]  Alain Giron,et al.  A genomic schism in birds revealed by phylogenetic analysis of DNA strings. , 2002, Systematic biology.

[12]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[13]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[14]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[15]  Lila Kari,et al.  The spectrum of genomic signatures: from dinucleotides to chaos game representation. , 2005, Gene.

[16]  Jijoy Joseph,et al.  Chaos game representation for comparison of whole genomes , 2006, BMC Bioinformatics.

[17]  R. Jansen,et al.  The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var 'Ridge Pineapple': organization and phylogenetic relationships to other angiosperms , 2006, BMC Plant Biology.

[18]  Matthew D. Welch,et al.  The ARP2/3 complex: an actin nucleator comes of age , 2006, Nature Reviews Molecular Cell Biology.

[19]  M. Chase,et al.  Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids , 2007, BMC Evolutionary Biology.

[20]  Dustin A. Cartwright,et al.  A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety , 2007, PloS one.

[21]  Towards a phylogenetic nomenclature of Tracheophyta , 2007 .

[22]  X. Gu,et al.  Comparative Analysis of Codon Usage Patterns Among Mitochondrion, Chloroplast and Nuclear Genes in Triticum aestivum L. , 2007 .

[23]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[24]  Towards a phylogenetic nomenclature of Tracheophyta , 2007 .

[25]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[26]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.

[27]  Haibao Tang,et al.  Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. , 2008, Genome research.

[28]  D. Sept,et al.  New insights into mechanism and regulation of actin capping protein. , 2008, International review of cell and molecular biology.

[29]  D. Soltis,et al.  Rosid radiation and the rapid rise of angiosperm-dominated forests , 2009, Proceedings of the National Academy of Sciences.

[30]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[31]  Bernhard Haubold,et al.  Efficient estimation of pairwise distances between genomes , 2009, Bioinform..

[32]  Ramón Doallo,et al.  ProtTest-HPC: Fast Selection of Best-Fit Models of Protein Evolution , 2010, Euro-Par Workshops.

[33]  Somdatta Sinha,et al.  Using genomic signatures for HIV-1 sub-typing , 2010, BMC Bioinformatics.

[34]  Robert W Murphy,et al.  Recent trends in molecular phylogenetic analysis: where to next? , 2011, The Journal of heredity.

[35]  Ramón Doallo,et al.  ProtTest 3: fast selection of best-fit models of protein evolution , 2011, Bioinform..

[36]  J. Poulain,et al.  The genome of Theobroma cacao , 2011, Nature Genetics.

[37]  Y. Van de Peer A mystery unveiled , 2011, Genome biology.

[38]  H. Bohnert,et al.  The genome of the extremophile crucifer Thellungiella parvula , 2011, Nature Genetics.

[39]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[40]  J. Poulain,et al.  The genome of the mesopolyploid crop species Brassica rapa , 2011, Nature Genetics.

[41]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[42]  Florian Odronitz,et al.  diArk 2.0 provides detailed analyses of the ever increasing eukaryotic genome sequencing data , 2011, BMC Research Notes.

[43]  M. Kollmar,et al.  Evolution of the eukaryotic dynactin complex, the activator of cytoplasmic dynein , 2012, BMC Evolutionary Biology.

[44]  Colin N. Dewey,et al.  Whole-genome alignment. , 2012, Methods in molecular biology.