Ksak: A high-throughput tool for alignment-free phylogenetics

Phylogenetic tools are fundamental to the studies of evolutionary relationships. In this paper, we present Ksak, a novel high-throughput tool for alignment-free phylogenetic analysis. Ksak computes the pairwise distance matrix between molecular sequences, using seven widely accepted k-mer based distance measures. Based on the distance matrix, Ksak constructs the phylogenetic tree with standard algorithms. When benchmarked with a golden standard 16S rRNA dataset, Ksak was found to be the most accurate tool among all five tools compared and was 19% more accurate than ClustalW2, a high-accuracy multiple sequence aligner. Above all, Ksak was tens to hundreds of times faster than ClustalW2, which helps eliminate the computation limit currently encountered in large-scale multiple sequence alignment. Ksak is freely available at https://github.com/labxscut/ksak.

[1]  Giuseppe Cattaneo,et al.  The power of word-frequency-based alignment-free functions: a comprehensive large-scale experimental analysis , 2021, Bioinform..

[2]  Z. Reich,et al.  Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy , 2021, PloS one.

[3]  S. Schbath,et al.  Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history , 2021, BMC genomics.

[4]  Fengzhu Sun,et al.  Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression , 2019, Genome Biology.

[5]  S. Yau,et al.  Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method. , 2019, Genomics.

[6]  F. Glöckner,et al.  SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees , 2017, BMC Bioinformatics.

[7]  Jed A. Fuhrman,et al.  CAFE: aCcelerated Alignment-FrEe sequence analysis , 2017, Nucleic Acids Res..

[8]  Daniel Standage,et al.  The khmer software package: enabling efficient nucleotide sequence analysis , 2015, F1000Research.

[9]  A. Ives,et al.  An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data , 2015, BMC Genomics.

[10]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[11]  Kazutaka Katoh,et al.  MAFFT: iterative refinement and additional methods. , 2014, Methods in molecular biology.

[12]  Kai Song,et al.  Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads , 2013, J. Comput. Biol..

[13]  Sagar Patel,et al.  Phylogenetic analysis of some leguminous trees using CLUSTALW2 bioinformatics tool , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[14]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[15]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[16]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[17]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[18]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .