Constructing Neighbor-Joining phylogenetic trees with reduced redundancy computation

A fast algorithm for constructing Neighbor-Joining phylogenetic trees has been developed. The CPU time is drastically reduced as compared with Saitou and Neiiquests algorithm (SN) and Studier and Kepler's algorithm (SK). The new algorithm includes three techniques: Firstly, a linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK), which can eliminate many repeated (redundancy) computations. Secondly, the value of A[i] are computed only once at the beginning of the algorithm, and are updated by three elements in the iteration. Thirdly, a very compact formula for the sum of all the branch lengths of OTUs (Operational Taxonomic Units) i and j has been designed. The results show that our algorithm is from tens to hundreds times faster than SN and about two times faster than SK when N increases, constructing the tree with 2000 OTUs in 3 minutes on our desktop computer (CPU: Intel Celeron 2.4 GHz, RAM: 256 MB and OS: Windows 2000 Professional).

[1]  Robert C. Edgar,et al.  Local homology recognition and distance measures in linear time using compressed amino acid alphabets. , 2004, Nucleic acids research.

[2]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[3]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[4]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  Chen Yang,et al.  PTC: an interactive tool for phylogenetic tree construction , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[8]  M. Nei,et al.  The neighbor-joining method , 1987 .

[9]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[10]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[11]  N. Saitou,et al.  Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic Tree Construction in Obtaining the Correct Tree , 1989 .