RANDOM DATA, HOMOPLASY AND INFORMATION

It is often assumed that homoplasy makes cladistic results uncertain. The minimum values that the consistency index C (Kluge and Fart-is, 1969) can achieve on most parsimonious trees decrease with number of taxa and have a more complex dependence with number of characters. Those minimum values have never been calculated, and therefore it is not known which values of C would indicate that there is as much homoplasy as is possible in a data set of a given size. Several authors have examined this problem recently, mostly by generating random data sets. The question they seem to be asking is “how can we find whether the data are as “bad” as possible?” In a separate paper (Goloboff, 1991), I have analysed the relationship between homoplasy and the degree to which a data set provides information to choose among the possible trees. I termed the degree to which the possible resolved trees for the data differ in length, the cladistic decisiveness of the data. Indecisive data provide no information regarding tree choice, and vice versa. I showed that, contrary to general belief, greater amounts of homoplasy on the most parsimonious tree do not imply that the data are less decisive. This is not surprising, since data for which no choice among trees is possible are those data for which every (resolved) tree requires the same amount of homoplasy, not necessarily those data with high homoplasy. I argued there that, contrary to the statements by some authors, C is a good measure of homoplasy. Another widely used statistic is the retention index (RI). As Farris (1989a) proposed, the RI ~strictlv speaking, its complement) expresses the homoplasy on a tree as a fraction of the maximum homoplasy that can be required for the data; thus, RI is logically not a measure of the homoplasy itself. I showed that neither C nor RI (nor their product, tht resraled consistency index, RC) necessarily have lower values on the most parsimonious tree when the possible trees for the data differ less in length (i.e. when the data are less decisive). Neither C, nor RI, nor RC were originally proposed to measure decisiveness; for such purpose, I proposed a statistic, Data Decisiveness, DD. DD increases when thr possible trees differ more in tree length, and it is 0 when all the possible resolved trees have the same length. In this paper, I will examine the proposals by other authors to assess informativeness and homoplasy of the data.

[1]  J. Archie Homoplasy Excess Statistics and Retention Indices: A Reply to Farris , 1990 .

[2]  J. Farris THE RETENTION INDEX AND THE RESCALED CONSISTENCY INDEX , 1989, Cladistics : the international journal of the Willi Hennig Society.

[3]  J. Farris EXCESS HOMOPLASY RATIOS , 1991 .

[4]  Daniel P. Faith,et al.  COULD A CLADOGRAM THIS SHORT HAVE ARISEN BY CHANCE ALONE?: ON PERMUTATION TESTS FOR CLADISTIC STRUCTURE , 1991 .

[5]  James W. Archie,et al.  A randomization test for phylogenetic information in systematic data , 1989 .

[6]  P. Goloboff HOMOPLASY AND THE CHOICE AMONG CLADOGRAMS , 1991, Cladistics : the international journal of the Willi Hennig Society.

[7]  J. Farris The Efficient Diagnoses of the Phylogenetie System , 1980 .

[8]  James W. Archie,et al.  Homoplasy Excess Ratios: New Indices for Measuring Levels of Homoplasy in Phylogenetic Systematics and a Critique of the Consistency Index , 1989 .

[9]  FREQUENCY DISTRIBUTIONS OF LENGTHS OF POSSIBLE NETWORKS FROM A DATA MATRIX , 1989 .

[10]  L. Werdelin WE ARE NOT OUT OF THE WOODS YET— A report from a Nobel Symposium , 1989 .

[11]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[12]  James S. Farms Simplicity and Informativeness in Systematics and Phylogeny , 1982 .

[13]  M. Donoghue,et al.  PATTERNS OF VARIATION IN LEVELS OF HOMOPLASY , 1989, Evolution; international journal of organic evolution.

[14]  O. Seberg THE SEVENTH ANNUAL MEETING OF THE WILLI HENNIG SOCIETY , 1989, Cladistics : the international journal of the Willi Hennig Society.

[15]  James S. Farris,et al.  The retention index and homoplasy excess , 1989 .

[16]  J. Archie PHYLOGENIES OF PLANT FAMILIES: A DEMONSTRATION OF PHYLOGENETIC RANDOMNESS IN DNA SEQUENCE DATA DERIVED FROM PROTEINS , 1989, Evolution; international journal of organic evolution.

[17]  James S. Farris,et al.  The Information Content of the Phylogenetic System , 1979 .