Techniques for Analyzing Large Data Sets

Parsimony problems of medium or large numbers of taxa can be analyzed only by means of trial-and-error or “heuristic” methods. Traditional strategies for finding most parsimonious trees have long been in use, implemented in the programs Hennig86 [1], PAUP [2], and NONA [3]. Although successful for small and medium-sized data sets, these techniques normally fail for analyzing very large data sets, i.e., data sets with 200 or more taxa. This is because rather than simply requiring more of the same kind of work used to analyze smaller data sets, very large data sets require the use of qualitatively different techniques. The techniques described here have so far been used only for prealigned sequences, but they could be adapted for other methods of analysis, like the direct optimization method of Wheeler [4].