Top-down phylogenetic tree reconstruction

We propose a novel method for reconstructing phylogenetic trees. It is based on a conceptual clustering method that is an extension of the well-known decision tree learning approach. As such, the method differs from existing methods, both algorithmically and with respect to the information it uses and the assumptions it makes. It scales better in terms of the number of sequences, and as such can be used on large sets of sequences, beyond the point where other methods break down. By nature, it has a stronger tendency to construct trees in which subspecies are defined by particular polymorphisms. Experiments show that its performance is comparable to that of state-of-the-art methods.

[1]  Abdullah N. Arslan,et al.  Phylogeny By Top Down Clustering Using a Given Multiple Alignment , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[2]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[3]  C. Krimbas,et al.  Accuracy of phylogenetic trees estimated from DNA sequence data. , 1987, Molecular biology and evolution.

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  B. Hall,et al.  Simulating DNA coding sequence evolution with EvolveAGene 3. , 2008, Molecular biology and evolution.

[6]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[7]  M. Salemi,et al.  The phylogenetic handbook : a practical approach to DNA and protein phylogeny , 2003 .

[8]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[9]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[10]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[11]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[12]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .