Accelerating the Neighbor-Joining Algorithm Using the Adaptive Bucket Data Structure

The complexity of the neighbor joining method is determinedby the complexity of the search for an optimal pair ("neighbors tojoin") performed globally at each iteration. Accelerating the neighbor-joining method requires performing a smarter search for an optimal pairof neighbors, avoiding re-evaluation of all possible pairs of points at eachiteration. We developed an acceleration technique for the neighbor-joining method that significantly decreases complexity for important applicationswithout any change in the neighbor-joining method. This techniqueutilizes the bucket data structure. The pairs of nodes are arranged inbuckets according to values of the goal function δij = ui+uj-dij. Bucketsare adaptively re-arranged after each neighbor-joining step. While thepairs of nodes in the top bucket are re-evaluated at every iteration, pairsin lower buckets are accessed more rarely, when the algorithm determinesthat the elements of the bucket need to be re-evaluated based on newvalues of δij. As a result, only a small portion of candidate pairs of nodesis examined at each iteration. The algorithm is cache efficient, since the bucket data structures areable to exploit locality and adjust to cache properties.

[1]  David R. Musser,et al.  STL tutorial and reference guide, second edition: C++ programming with the standard template library , 2001 .

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[3]  Tatiana A. Tatusova,et al.  An Adaptive Resolution Tree Visualization of Large Influenza Virus Sequence Datasets , 2007, ISBRA.

[4]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[5]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[6]  J. Foster,et al.  Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method , 2006, Journal of Molecular Evolution.

[7]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[8]  T. Tatusova,et al.  The Influenza Virus Resource at the National Center for Biotechnology Information , 2007, Journal of Virology.

[9]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[10]  O. Gascuel,et al.  The Minimum-Evolution Distance-Based Approach to Phylogeny Inference , 2005 .

[11]  Robert A. Wagner,et al.  A Shortest Path Algorithm for Edge-Sparse Graphs , 1976, J. ACM.

[12]  Ziheng Yang,et al.  Computational Molecular Evolution , 2006 .

[13]  Thomas Mailund,et al.  Recrafting the Neighbor-joining Method , 2006 .

[14]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[15]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[16]  L FoxBennett,et al.  Shortest-Route Methods , 1979 .

[17]  E. Denardo,et al.  Shortest-Route Methods: 1. Reaching, Pruning, and Buckets , 1979, Oper. Res..

[18]  Thomas Mailund,et al.  QuickJoin - fast neighbour-joining tree reconstruction , 2004, Bioinform..

[19]  Oliver Eulenstein,et al.  Bioinformatics Research and Applications , 2008 .

[20]  Gerth Stølting Brodal,et al.  Engineering a cache-oblivious sorting algorithm , 2008, JEAL.

[21]  Robert B. Dial,et al.  Algorithm 360: shortest-path forest with topological ordering [H] , 1969, CACM.

[22]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  James A. Foster,et al.  Phylogenetics Clearcut : a fast implementation of relaxed neighbor joining , 2006 .

[24]  Andrew V. Goldberg,et al.  Buckets, heaps, lists, and monotone priority queues , 1997, SODA '97.

[25]  David Bryant,et al.  On the Uniqueness of the Selection Criterion in Neighbor-Joining , 2005, J. Classif..

[26]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[27]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[28]  Jens Lagergren,et al.  Fast neighbor joining , 2005, Theor. Comput. Sci..

[29]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[30]  O. Gascuel,et al.  Neighbor-joining revealed. , 2006, Molecular biology and evolution.

[31]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[32]  David R. Musser,et al.  STL tutorial and reference guide - C++ programming with the standard template library , 1996, Addison-Wesley professional computing series.