Combining multiple data sets in a likelihood analysis: which models are the best?

Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular data sets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model), assuming that branch lengths are proportional among genes (proportional model), or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino acid data sets, our results suggest that, depending on the data set chosen, either the separate model or the proportional model represents the most appropriate method for branch length analysis. For all the data sets examined, one gamma parameter for each gene represents the best model for among-site rate variation. Using these models we analyzed alternative mammalian tree topologies, and we describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.

[1]  Masami Hasegawa,et al.  Maximum Likelihood Analysis of the Complete Mitochondrial Genomes of Eutherians and a Reevaluation of the Phylogeny of Bats and Insectivores , 2001, Journal of Molecular Evolution.

[2]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[3]  Tal Pupko,et al.  A structural EM algorithm for phylogenetic inference , 2001, J. Comput. Biol..

[4]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[5]  D. Penny,et al.  Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[6]  J. Thewissen,et al.  Skeletons of terrestrial cetaceans and the relationship of whales to artiodactyls , 2001, Nature.

[7]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[8]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[9]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[10]  M. Stanhope,et al.  The interphotoreceptor retinoid binding protein gene in therian mammals: implications for higher level relationships and evidence for loss of function in the marsupial mole. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Grétarsdóttir,et al.  The Mitochondrial Genome of the Sperm Whale and a New Molecular Reference for Estimating Eutherian Divergence Dates , 2000, Journal of Molecular Evolution.

[12]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[13]  John P. Huelsenbeck,et al.  A Likelihood Ratio Test to Detect Conflicting Phylogenetic Signal , 1996 .

[14]  G. Pesole,et al.  Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. , 2000, Gene.

[15]  M. Stanhope,et al.  Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition. , 1999, Systematic biology.

[16]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[17]  J. Adachi,et al.  MOLPHY version 2.3 : programs for molecular phylogenetics based on maximum likelihood , 1996 .

[18]  H. Linhart A test whether two AIC's differ significantly , 1988 .

[19]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[20]  Diana J. Kao,et al.  Parallel adaptive radiations in two major clades of placental mammals , 2001, Nature.

[21]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[22]  Thomas Lengauer,et al.  Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB 2001, Montréal, Québec, Canada, April 22-25, 2001 , 2001, Annual International Conference on Research in Computational Molecular Biology.

[23]  Andrew W. Douglas Fundamentals of Molecular Evolution, 2nd Edition , 2000 .

[24]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[25]  M. Novacek Mammalian phylogeny: shaking the tree. , 1992, Nature.

[26]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[27]  Dan Graur,et al.  Fundamentals of Molecular Evolution, 2nd Edition , 2000 .

[28]  Heather M. Amrine,et al.  Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. , 2001, Molecular biology and evolution.

[29]  J. Schmitz,et al.  The complete mitochondrial genome of Tupaia belangeri and the phylogenetic affiliation of scandentia to other eutherian orders. , 2000, Molecular biology and evolution.

[30]  George Gaylord Simpson,et al.  Classification of mammals : above the species level , 1997 .

[31]  R. Ward,et al.  Mitochondrial genes and mammalian phylogenies: increasing the reliability of branch length estimation. , 2000, Molecular biology and evolution.

[32]  Tal Pupko,et al.  A Structural EM Algorithm for Phylogenetic Inference , 2002, J. Comput. Biol..

[33]  S. Pääbo,et al.  Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[34]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[35]  A. Janke,et al.  Molecular evidence of an African Phiomorpha-South American Caviomorpha clade and support for Hystricognathi based on the complete mitochondrial genome of the cane rat (Thryonomys swinderianus). , 2001, Molecular phylogenetics and evolution.

[36]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[37]  M. Hasegawa,et al.  Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data. , 2000, Gene.

[38]  David R. Anderson,et al.  Model selection and inference : a practical information-theoretic approach , 2000 .