DLTree: efficient and accurate phylogeny reconstruction using the dynamical language method

Summary: A number of alignment‐free methods have been proposed for phylogeny reconstruction over the past two decades. But there are some long‐standing challenges in these methods, including requirement of huge computer memory and CPU time, and existence of duplicate computations. In this article, we address these challenges with the idea of compressed vector, fingerprint and scalable memory management. With these ideas we developed the DLTree algorithm for efficient implementation of the dynamical language model and whole genome‐based phylogenetic analysis. The DLTree algorithm was compared with other alignment‐free tools, demonstrating that it is more efficient and accurate for phylogeny reconstruction. Availability and Implementation: The DLTree algorithm is freely available at http://dltree.xtu.edu.cn Contact: yuzuguo@aliyun.com or yangjy@nankai.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Zu-Guo Yu,et al.  Proper Distance Metrics for Phylogenetic Analysis Using Complete Genomes without Sequence Alignment , 2010, International journal of molecular sciences.

[2]  G. B. Golding,et al.  The role of laterally transferred genes in adaptive evolution , 2007, BMC Evolutionary Biology.

[3]  J. Qi,et al.  Whole genome molecular phylogeny of large dsDNA viruses using composition vector method , 2007, BMC Evolutionary Biology.

[4]  Klas Hatje,et al.  Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches , 2014, Nucleic Acids Res..

[5]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[6]  B. Blaisdell A measure of the similarity of sets of sequences not requiring sequence alignment. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Vladimir Makarenkov,et al.  T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks , 2012, Nucleic Acids Res..

[8]  Guanghong Zuo,et al.  CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy , 2015, Genom. Proteom. Bioinform..

[9]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[10]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[11]  K. Chu,et al.  Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment , 2005, Journal of Molecular Evolution.

[12]  Li-Qian Zhou,et al.  Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model , 2010, BMC Evolutionary Biology.

[13]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..