The sampling distributions and covariance matrix of phylogenetic spectra

We extend recent advances in computing variance-covariance matrices from genetic distances to a sequence method of phylogenetic analysis. These matrices, together with other statistical properties of corrected sequence spectra, are studied as a foundation for more powerful and testable methods in phylogenetics. We start with 8, a vector of the proportion of sites in a sequence of length c showing each of the possible character-state patterns for t taxa. Hadamard conjugations are then used to calculate 9, a vector of the the support for bipartitions, or splits, in the data, after correcting for all implied multiple changes. These corrections are made independently of any tree and are illustrated with Cavender’s two-character-state model. Each entry in f (% excluded) that is not associated with an edge on the tree that generated the data is an invariant (sensu Cavender) with an expected value of 0 as the number of sites c--,00. Under an independent identically distributed model (sites are independent and identically distributed), vector $ is a random sample from a scaled multinomial distribution. Starting from this point, we illustrate the derivation of V[f], the variance-covariance matrix of y. The bias induced by the delta method, a convenient approximation in deriving V[y], is evaluated for both population and sample variance-covariance matrices. It is found to be acceptable in the first case and very good in the second. Likewise bias in 9 due to a logarithmic transform and to short sequences is also acceptable. We infer the marginal distributions of entries in f. Simulations with illustrative values of c and h (the rate per site) show how 4 tends to multivariate normal as c+ co. Our results extend naturally to four-color (nucleotide) spectra.

[1]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[2]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[3]  M. Nei,et al.  A Simple Method for Estimating and Testing Minimum-Evolution Trees , 1992 .

[4]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[5]  Michael D. Hendy,et al.  A combinatorial description of the closest tree algorithm for finding evolutionary trees , 1991, Discret. Math..

[6]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[7]  L. Jin,et al.  Variances of the average numbers of nucleotide substitutions within and between populations. , 1989, Molecular biology and evolution.

[8]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[9]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[10]  M. Bulmer Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data , 1991 .

[11]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[12]  László A. Székely,et al.  SPECTRAL ANALYSIS AND A CLOSEST TREE METHOD FOR GENETIC SEQUENCES , 1992 .

[13]  J. A. Cavender Taxonomy with confidence , 1978 .

[14]  Wojtek J. Krzanowski,et al.  Principles of multivariate analysis : a user's perspective. oxford , 1988 .

[15]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[16]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[17]  R. Tolimieri,et al.  Algorithms for Discrete Fourier Transform and Convolution , 1989 .

[18]  John E. Freund,et al.  Mathematical statistics (4th ed.) , 1986 .