High performance phylogenetic inference

Phylogenetic analysis is an integral part of many biological research programs. In essence, it is the study of gene genealogy. It is the study of gene mutation and the generational relationships. Phylogenetic analysis is being used in many diverse areas such as human epidemiology, viral transmission, biogeography, and systematics. Researchers are now commonly generating many DNA sequences from many individuals, thus creating very large data sets. However, our ability to analyze the data has not kept pace with data generation, and phylogenetics has now reached a crossroads where we cannot effectively analyze the data we generate. The chief challenge of phylogenetic systematics in the next century will be to develop algorithms and search strategies to effectively analyze large data sets. The crux of the computational problem is that the actual landscape of possible topologies can be extraordinarily difficult to evaluate with large data sets. The parsimony ratchet is actually a family of iterative tree search methods that use a statistical approach to sampling tree islands and ultimately finding the most parsimonious trees for a data set. Each iteration of the parsimony ratchet may occur in parallel as there is no direct dependency between iterations. The authors' implementation of the parallel ratchet is masterworker based. A master process is launched in the DOGMA system which then launches worker tasks on available nodes. Each worker task is simply wrapper code that is used to interact with the newest release version of NONA.

[1]  K. Crandall,et al.  Multiple interspecies transmissions of human and simian T-cell leukemia/lymphoma virus type I sequences. , 1996, Molecular biology and evolution.

[2]  Mark J. Clement,et al.  The DOGMA approach to high-utilization supercomputing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[3]  F. Ayala Molecular systematics , 2004, Journal of Molecular Evolution.

[4]  Mark J. Clement,et al.  DOGMA: distributed object group metacomputing architecture , 1998, Concurr. Pract. Exp..

[5]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.