Congruence versus phylogenetic accuracy: revisiting the incongruence length difference test.

Phylogenies inferred from independent data partitions usually differ from one another in topology despite the fact that they are drawn from the same set of organisms (Rodrigo et al., 1993). Some topological differences are due to sampling error or to the use of inappropriate phylogenetic models. These types of topological incongruence do not have their origin in genealogical discordance, i.e., differences between phylogenies underlying the respective data partitions (Baum et al., 1998). Incongruence that is not due to genealogical discordance can often be addressed by modifying the model used in phylogenetic reconstruction (Cunningham, 1997b), and combining data is an appropriate way of dealing with random topological differences that are attributable to sampling error. However, other topological differences, e.g., those arising from lineage sorting (Maddison, 1997; Avise, 2000) and hybridization (Dumolin-Lapegue et al., 1997; Rieseberg, 1997; McKinnon et al., 1999; Avise, 2000), reflect genealogical discordance between the data partitions. Most systematists consider data partitions to be combinable if and only if they are not strongly incongruent with one another (Sytsma, 1990; Bull et al., 1993; Huelsenbeck et al., 1996; Baum et al., 1998; Johnson and Soltis, 1998; Thornton and DeSalle, 2000; Yoder et al., 2001; Barker and Lutzoni, 2002; Buckley et al., 2002). Systematists who follow this prior agreement or conditional combination approach to analyzing multiple data partitions (Bull et al., 1993; Huelsenbeck et al., 1996; Johnson and Soltis, 1998) evaluate incongruence using tests such as the incongruence length difference (ILD) test (Farris et al., 1994, 1995) or other tests of taxonomic congruence (Templeton, 1983; Kishino and Hasegawa, 1989; Larson, 1994; Shimodaira and Hasegawa, 1999) before deciding whether the partitions should be analyzed in combination. Data that exhibit strong incongruence are then analyzed separately or under assumptions that minimize incongruence (Cunningham, 1997b). In their article “Failure of the ILD to determine data combinability for slow loris phylogeny,” Yoder et al. (2001) critiqued the ILD test based on the observation that it will sometimes identify data partitions as incongruent when in fact those partitions combine to produce an accurate estimate of organismal phylogeny. They described the ILD test as a failed test of data combinability, maintaining that the presumed accuracy of trees inferred from combined data indicates the congruence of the data partitions. We have two objections to their argument (2001:421) that “the ILD [should] never be used as a test of data partition combinability.” First, what Yoder et al. described as a flaw in the ILD test as applied to their data, i.e., an apparent inverse relationship between phylogenetic accuracy and data partition congruence as measured by the ILD test, turns out to be an artifact of analysis. There is in fact a bimodal relationship between congruence and accuracy: as either data partition is upweighted, homoplasy in the combined data set is swamped by homoplasy within the upweighted data partition, reducing the significance of the ILD test. At the same time, the topology of the combined analysis shifts to reflect the topology of the upweighted data partition. This phenomenon is predictable and can be accounted for in the analysis (Dowton and Austin, 2002). Second, Yoder et al.’s expectation that ILD test results should predict the phylogenetic accuracy of the combined data analysis is unreasonable. The ILD test is used to evaluate the null hypothesis that characters that make up two or more data partitions are drawn at random from a single population of characters, i.e., a population of characters that reflects a single phylogeny and a single set of evolutionary processes (Farris et al., 1995). Because accuracy of trees derived from a data set depends on many factors other than congruence among data partitions, the ILD test cannot be used to directly address questions related to phylogenetic accuracy. Genealogically discordant data can be combined to yield accurate phylogenies, whereas data that are congruent (both genealogically concordant and homogeneous in underlying evolutionary process) can be combined to yield phylogenies that do not accurately represent organismal history (Cunningham, 1997a). A damaging critique of the ILD test would have to appeal to criteria other than

[1]  F. K. Barker,et al.  The utility of the incongruence length difference test. , 2002, Systematic biology.

[2]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[3]  G. Lecointre,et al.  When does the incongruence length difference test fail? , 2002, Molecular biology and evolution.

[4]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[5]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[6]  B. Payseur,et al.  Failure of the ILD to determine data combinability for slow loris phylogeny. , 2001, Systematic biology.

[7]  Hidetoshi Shimodaira,et al.  Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. , 2001, Molecular biology and evolution.

[8]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[9]  C. Orme,et al.  Noise and incongruence: interpreting results of the incongruence length difference test. , 2000, Molecular phylogenetics and evolution.

[10]  J. Searle Phylogeography — The History and Formation of Species , 2000, Heredity.

[11]  R. DeSalle,et al.  A new method to localize and test the significance of incongruence: detecting domain shuffling in the nuclear receptor superfamily. , 2000, Systematic biology.

[12]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[13]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[14]  D. Steane,et al.  Incongruence between chloroplast and species phylogenies in Eucalyptus subgenus Monocalyptus (Myrtaceae). , 1999, American journal of botany.

[15]  J. Wendel,et al.  Biogeography and floral evolution of baobabs (Adansonia, Bombacaceae) as inferred from multiple data sets. , 1998, Systematic biology.

[16]  Loren H. Rieseberg,et al.  Hybrid Origins of Plant Species , 1997 .

[17]  C. Cunningham Is congruence between data partitions a reliable predictor of phylogenetic accuracy? Empirically testing an iterative procedure for choosing among phylogenetic methods. , 1997, Systematic biology.

[18]  R. Petit,et al.  Phylogeographic structure of white oaks throughout the European continent. , 1997, Genetics.

[19]  C. Cunningham,et al.  Can three incongruence tests predict when data should be combined? , 1997, Molecular biology and evolution.

[20]  E. Kellogg,et al.  Testing for Phylogenetic Conflict Among Molecular Data Sets in the Tribe Triticeae (Gramineae) , 1996 .

[21]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[22]  Carol J. Bult,et al.  Constructing a Significance Test for Incongruence , 1995 .

[23]  B. Schierwater,et al.  Molecular Ecology and Evolution: Approaches and Applications , 1995, Experientia Supplementum.

[24]  C. Bult,et al.  TESTING SIGNIFICANCE OF INCONGRUENCE , 1994 .

[25]  A. Yoder Relative position of the Cheirogaleidae in strepsirhine phylogeny: a comparison of morphological and molecular methods and results. , 1994, American journal of physical anthropology.

[26]  J. Bull,et al.  Partitioning and combining data in phylogenetic analysis , 1993 .

[27]  Allen G. Rodrigo,et al.  A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree , 1993 .

[28]  M. Miyamoto,et al.  Phylogenetic Analysis of DNA Sequences , 1991 .

[29]  K. Sytsma DNA and morphology: Inference of plant phylogeny , 1990 .

[30]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[31]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[32]  A. R. Templeton,et al.  PHYLOGENETIC INFERENCE FROM RESTRICTION ENDONUCLEASE CLEAVAGE SITE MAPS WITH PARTICULAR REFERENCE TO THE EVOLUTION OF HUMANS AND THE APES , 1983, Evolution; international journal of organic evolution.

[33]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[34]  A. Austin,et al.  Increased congruence does not necessarily indicate increased phylogenetic accuracy--the behavior of the incongruence length difference test in mixed-model analyses. , 2002, Systematic biology.

[35]  Peter Arensburger,et al.  Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. , 2002, Systematic biology.

[36]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[37]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[38]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[39]  Leigh A. Johnson,et al.  Assessing Congruence: Empirical Examples from Molecular Data , 1998 .

[40]  A. Larson The comparison of morphological and molecular data in phylogenetic systematics. , 1994, EXS.

[41]  D. Ord,et al.  PAUP:Phylogenetic analysis using parsi-mony , 1993 .