STATISTICAL APPROACH TO TESTS INVOLVING PHYLOGENIES

This chapter reviews statistical testing involving phylogenies. We present both the classical framework with the use of sampling distributions involving the bootstrap and permutation tests and the Bayesian approach using posterior distributions. We give some examples of direct tests for deciding whether the data support a given tree or trees that share a particular property, comparative analyses using tests that condition on the phylogeny being known are also discussed. We introduce a continuous parameter space that enables one to avoid the delicate problem of comparing exponentially many possible models with a finite amount of data. This chapter contains a review of the literature on parametric tests in phylogenetics and some suggestions of non-parametric tests. We also present some open questions that have to be solved by mathematical statisticians to provide the theoretical justification of both current testing strategies and as yet underdeveloped areas of statistical testing in non-standard frameworks.

[1]  R. Eric O'Connor Canadian Mathematical Congress , 1945 .

[2]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[3]  Landscape with trees , 1967 .

[4]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[5]  Elizabeth A. Thompson,et al.  Human Evolutionary Trees , 1975 .

[6]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[7]  L. Foulds,et al.  Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences , 1982, Nature.

[8]  Bradley Efron,et al.  Comparing Non-Nested Linear Models , 1984 .

[9]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[10]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[11]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[12]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[13]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[14]  B. MacFadden,et al.  Explosive speciation at the base of the adaptive radiation of Miocene grazing horses , 1988, Nature.

[15]  T Gojobori,et al.  Molecular phylogeny and evolution of primate mitochondrial DNA. , 1988, Molecular biology and evolution.

[16]  Masami Hasegawa,et al.  CONFIDENCE LIMITS ON THE MAXIMUM‐LIKELIHOOD ESTIMATE OF THE HOMINOID TREE FROM MITOCHONDRIAL‐DNA SEQUENCES , 1989, Evolution; international journal of organic evolution.

[17]  P. Diaconis A Generalization of Spectral Analysis with Application to Ranked Data , 1989 .

[18]  D. Maddison The discovery and importance of multiple islands of most , 1991 .

[19]  M. Lynch METHODS FOR THE ANALYSIS OF COMPARATIVE DATA IN EVOLUTIONARY BIOLOGY , 1991, Evolution; international journal of organic evolution.

[20]  Frederick Mosteller,et al.  Methods for studying coincidences , 1989 .

[21]  Robert E. Tarjan,et al.  Short Encodings of Evolving Structures , 1992, SIAM J. Discret. Math..

[22]  E. Martins The Comparative Method in Evolutionary Biology, Paul H. Harvey, Mark D. Pagel. Oxford University Press, Oxford (1991), vii, + 239 Price $24.95 paperback , 1992 .

[23]  Regina Y. Liu,et al.  Ordering directional data: concepts of data depth on circles and spheres , 1992 .

[24]  Joseph S. Verducci,et al.  Probability Models and Statistical Analyses for Ranking Data , 1992 .

[25]  Larry Gonick,et al.  Cartoon Guide to Statistics , 1993 .

[26]  K. Bremer,et al.  BRANCH SUPPORT AND TREE STABILITY , 1994 .

[27]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD TREES FROM DNA SEQUENCES: A PECULIAR STATISTICAL ESTIMATION PROBLEM , 1995 .

[28]  A Rzhetsky,et al.  Interior-branch and bootstrap tests of phylogenetic trees. , 1995, Molecular biology and evolution.

[29]  A. Zharkikh,et al.  Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. , 1995, Molecular phylogenetics and evolution.

[30]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. Aldous PROBABILITY DISTRIBUTIONS ON CLADOGRAMS , 1996 .

[32]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[33]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[34]  Ming Li,et al.  Some Notes on the Nearest Neighbour Interchange Distance , 1996, COCOON.

[35]  Joseph T. Chang,et al.  Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. , 1996, Mathematical biosciences.

[36]  Michael A. Newton,et al.  Bootstrapping phylogenies: Large deviations and dispersion effects , 1996 .

[37]  D. Hillis Inferring complex phylogenies. , 1996, Nature.

[38]  Imperfect Information and the Balance of Cladograms and Phenograms , 1996 .

[39]  Arne Ø. Mooers,et al.  Inferring Evolutionary Process from Phylogenetic Tree Shape , 1997, The Quarterly Review of Biology.

[40]  T. F. Hansen,et al.  Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data , 1997, The American Naturalist.

[41]  Susan Holmes,et al.  Phylogenies: An Overview , 1997 .

[42]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[43]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[44]  Ziheng Yang,et al.  STATISTICAL TESTS OF HOST‐PARASITE COSPECIATION , 1997, Evolution; international journal of organic evolution.

[45]  M. Nei,et al.  The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[46]  R. Tibshirani,et al.  The problem of regions , 1998 .

[47]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[48]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[49]  O. Gascuel Evidence for a Relationship Between Algorithmic Scheme and Shape of Inferred Trees , 2000 .

[50]  David Aldous,et al.  Mixing Time for a Markov Chain on Cladograms , 2000, Combinatorics, Probability and Computing.

[51]  M J Sanderson,et al.  Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). , 2000, Systematic biology.

[52]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[53]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[54]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[55]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[56]  J. Berger,et al.  Bayesian and Conditional Frequentist Testing of a Parametric Model Versus Nonparametric Alternatives , 2001 .

[57]  J. Huelsenbeck,et al.  Potential applications and pitfalls of Bayesian inference of phylogeny. , 2002, Systematic biology.

[58]  Nikita S. Imennov,et al.  Geographic origin of human mitochondrial DNA: accommodating phylogenetic uncertainty and model comparison. , 2002, Systematic biology.

[59]  A. Mooers,et al.  Signatures of random and selective mass extinctions in phylogenetic tree balance. , 2002, Systematic biology.

[60]  E. Martins,et al.  Phylogeny shape and the phylogenetic comparative method. , 2002, Systematic biology.

[61]  Peter Arensburger,et al.  Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. , 2002, Systematic biology.

[62]  P. Diaconis,et al.  Random walks on trees and matchings , 2002 .

[63]  Jason Schweinsberg An O(n 2 ) bound for the relaxation time of a Markov chain on cladograms , 2002 .

[64]  Stéphane Aris-Brosou,et al.  How Bayes tests of molecular phylogenies compare with frequentist approaches , 2003, Bioinform..

[65]  S. Holmes,et al.  Bootstrapping Phylogenetic Trees: Theory and Methods , 2003 .

[66]  Susan Holmes,et al.  Statistics for phylogenetic trees. , 2003, Theoretical population biology.

[67]  H. Kishino,et al.  Time flies, a new molecular time-scale for brachyceran fly evolution without a clock. , 2003, Systematic biology.

[68]  S. Aris-Brosou Least and most powerful phylogenetic tests to elucidate the origin of the seed plants in the presence of conflicting signals under misspecified models. , 2003, Systematic biology.

[69]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[70]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[71]  A. O. Houcine On hyperbolic groups , 2006 .