Phylogeny Inference Based on Spectral Graph Clustering

Phylogeny inference is an importance issue in computational biology. Some early approaches based on characteristics such as the maximum parsimony algorithm and the maximum likelihood algorithm will become intractable when the number of taxonomic units is large. Recent algorithms based on distance data which adopt an agglomerative scheme are widely used for phylogeny inference. However, they have to recursively merge the nearest pair of taxa and estimate a distance matrix; this may enlarge the error gradually, and lead to an inaccurate tree topology. In this study, a splitting algorithm is proposed for phylogeny inference by using the spectral graph clustering (SGC) technique. The SGC algorithm splits graphs by using the maximum cut criterion and circumvents optimization problems through solving a generalized eigenvalue system. The promising features of the proposed algorithm are the following: (i) using a heuristic strategy for constructing phylogenies from certain distance functions, which are not even additive; (ii) distance matrices do not have to be estimated recursively; (iii) inferring a more accurate tree topology than that of the Neighbor-joining (NJ) algorithm on simulated datasets; and (iv) strongly supporting hypotheses induced by other methods for Baculovirus genomes. Our numerical experiments confirm that the SGC algorithm is efficient for phylogeny inference.

[1]  E. Herniou,et al.  The genome sequence and evolution of baculoviruses. , 2003, Annual review of entomology.

[2]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[3]  Kristoffer Forslund,et al.  QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. , 2006, Molecular biology and evolution.

[4]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[5]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[6]  Sanjeev Mahajan,et al.  Derandomizing Approximation Algorithms Based on Semidefinite Programming , 1999, SIAM J. Comput..

[7]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[8]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[9]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[10]  Thomas M. Keane,et al.  DPRml: distributed phylogeny reconstruction by maximum likelihood , 2005, Bioinform..

[11]  Christopher J. Lucarotti,et al.  Sequence and Organization of the Neodiprion lecontei Nucleopolyhedrovirus Genome , 2004, Journal of Virology.

[12]  Satish Rao,et al.  Using Max Cut to Enhance Rooted Trees Consistency , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[15]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  E. Herniou,et al.  Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny , 2001, Journal of Virology.

[17]  Masatoshi Nei,et al.  The number of nucleotides required to determine the branching order of three species, with special reference to the human-chimpanzee-gorilla divergence , 2005, Journal of Molecular Evolution.

[18]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[19]  M. Rosenberg,et al.  Traditional phylogenetic reconstruction methods reconstruct shallow and deep evolutionary relationships equally well. , 2001, Molecular biology and evolution.

[20]  Huálín Wáng,et al.  Towards a molecular identification and classification system of lepidopteran-specific baculoviruses. , 2004, Virology.

[21]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[22]  P. Zanotto,et al.  Phylogenetic interrelationships among baculoviruses: evolutionary rates and host associations. , 1993, Journal of invertebrate pathology.

[23]  Chris Sholley,et al.  Baculovirus Phylogeny Based on Genome Rearrangements , 2007, RECOMB-CG.

[24]  Sudhir Kumar,et al.  Efficiency of the Neighbor-Joining Method in Reconstructing Deep and Shallow Evolutionary Relationships in Large Phylogenies , 2000, Journal of Molecular Evolution.

[25]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[26]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[27]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[28]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[29]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[30]  Tamir Tuller,et al.  Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion , 2008, Bioinform..

[31]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[32]  E. Herniou,et al.  Baculovirus phylogeny and evolution. , 2007, Current drug targets.

[33]  Tal Pupko,et al.  Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates , 2007, Bioinform..

[34]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[35]  Tamir Tuller,et al.  Maximum likelihood of evolutionary trees: hardness and approximation , 2005, ISMB.

[36]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[37]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[38]  Charles Delorme,et al.  Laplacian eigenvalues and the maximum cut problem , 1993, Math. Program..

[39]  Thomas Mailund,et al.  Recrafting the Neighbor-joining Method , 2006 .

[40]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[41]  Zhìhóng Hú,et al.  Molecular identification and phylogenetic analysis of baculoviruses from Lepidoptera. , 2006, Virology.

[42]  Xizhou Feng,et al.  Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..

[43]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[44]  Olivier Gascuel,et al.  Fast NJ-like algorithms to deal with incomplete distance matrices , 2008, BMC Bioinformatics.

[45]  Alex Bateman,et al.  QuickTree: building huge Neighbour-Joining trees of protein sequences , 2002, Bioinform..

[46]  J. Foster,et al.  Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method , 2006, Journal of Molecular Evolution.