MINIMUM MUTATION FITS TO A GIVEN TREE

SUMMARY A number of objects, such as species, lie at the ends of a known evolutionary tree. A variable taking a finite number of possible values is specified on this set of objects. How can the values of the variable be estimated for the ancestors of the objects? One way is to assign to the ancestors those values which have the minimum number of mutations (or changes) in going from ancestors to their immediate descendants. In this paper, a method of generating all such minimum mutation fits is described. An evolutionary model for a set of objects is a family tree of possibly hypothetical ancestors through which each object may be traced back to the same primordial ancestor. Evolutionary models are used in the classification of plant and animal life, languages, motor cars, cultures, religions. The construction of the family tree is a difficult problem requiring synthesis of many types of knowledge. Suppose that the family tree is given, and that a variable V (such as number of limbs, for animals) is given for the set of objects (such as species, or families) at the ends of the tree. What values will V take for the hypothetical ancestors? A complete answer to this question is a probability distribution over the set of all possible values that the ancestors might take. A more modest answer is to assign values of V to the ancestors in such a way that the minimum number of changes in V occur, between ancestors and their immediate descendants. This "minimum mutation" fit is most likely under some reasonable probability models, but seems compelling in its own right. It is the assignment which permits representation of the data in a minimum number of symbols. Camin and Sokal [1965] consider the problem Qf finding an evolutionary tree when each variable has an ordered set of values, and mutation can only take place from a lower to a higher value. Estabrook [1968] extends this structure on the values of the variable to be a partial order with tree structure-for each variable, an evolutionary tree is known connecting the values. In both of these formulations, the minimum mutation fit to a given tree is not a serious problem. The optimal value for an ancestor is always the most primitive value in its descendants. Cavalli-Sforza and Edwards [1967] consider minimum mutation fits 53