Combining Multiple Datasets in a Likelihood Analysis: Which Models Are Best?

Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular datasets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model); assuming that branch lengths are proportional among genes (proportional model); or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino-acid datasets, our results suggest that, depending on the dataset chosen, either the separate model or the proportional model represent the most appropriate method for branch length analysis. For all datasets examined, one gamma parameter to each gene represents the best model for among-site rate variation. Using these models, we analyzed alternative mammalian tree topologies and describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.

[1]  A. Janke,et al.  Molecular evidence of an African Phiomorpha-South American Caviomorpha clade and support for Hystricognathi based on the complete mitochondrial genome of the cane rat (Thryonomys swinderianus). , 2001, Molecular phylogenetics and evolution.

[2]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[3]  J. Schmitz,et al.  The complete mitochondrial genome of Tupaia belangeri and the phylogenetic affiliation of scandentia to other eutherian orders. , 2000, Molecular biology and evolution.

[4]  M. Hasegawa,et al.  Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data. , 2000, Gene.

[5]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[6]  S. Grétarsdóttir,et al.  The Mitochondrial Genome of the Sperm Whale and a New Molecular Reference for Estimating Eutherian Divergence Dates , 2000, Journal of Molecular Evolution.

[7]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[8]  G. Pesole,et al.  Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. , 2000, Gene.

[9]  Masami Hasegawa,et al.  Maximum Likelihood Analysis of the Complete Mitochondrial Genomes of Eutherians and a Reevaluation of the Phylogeny of Bats and Insectivores , 2001, Journal of Molecular Evolution.

[10]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[11]  George Gaylord Simpson,et al.  Classification of mammals : above the species level , 1997 .

[12]  Diana J. Kao,et al.  Parallel adaptive radiations in two major clades of placental mammals , 2001, Nature.

[13]  A. Eyre-Walker Fundamentals of Molecular Evolution (2nd edn) , 2000, Heredity.

[14]  D. Penny,et al.  Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[15]  J. Thewissen,et al.  Skeletons of terrestrial cetaceans and the relationship of whales to artiodactyls , 2001, Nature.

[16]  Heather M. Amrine,et al.  Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. , 2001, Molecular biology and evolution.

[17]  M. Stanhope,et al.  The interphotoreceptor retinoid binding protein gene in therian mammals: implications for higher level relationships and evidence for loss of function in the marsupial mole. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[19]  S. Pääbo,et al.  Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[20]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[21]  M. Stanhope,et al.  Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition. , 1999, Systematic biology.

[22]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[23]  Thomas Lengauer,et al.  Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB 2001, Montréal, Québec, Canada, April 22-25, 2001 , 2001, Annual International Conference on Research in Computational Molecular Biology.

[24]  John P. Huelsenbeck,et al.  A Likelihood Ratio Test to Detect Conflicting Phylogenetic Signal , 1996 .

[25]  H. Linhart A test whether two AIC's differ significantly , 1988 .

[26]  R. Ward,et al.  Mitochondrial genes and mammalian phylogenies: increasing the reliability of branch length estimation. , 2000, Molecular biology and evolution.

[27]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[28]  M. Novacek Mammalian phylogeny: shaking the tree. , 1992, Nature.

[29]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[30]  J. Adachi,et al.  MOLPHY version 2.3 : programs for molecular phylogenetics based on maximum likelihood , 1996 .

[31]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .