Confidence in evolutionary trees from biological sequence data

THE reliable construction of evolutionary trees from nucleotide sequences often depends on randomization tests such as the bootstrap1 and FTP (cladistic permutation tail probability) tests2–6. The genomes of bacteria7, viruses8, animals7,9,10 and plants11, however, vary widely in their nucleotide frequencies. Where genomes have independently acquired similar G+C base compositions, signals in the data arise that cause methods of evolutionary tree reconstruction to estimate the wrong tree by grouping together sequences with similar G+C content12–14. Under these conditions randomization tests can lead to both the rejection of the correct evolutionary hypothesis and acceptance of an incorrect hypothesis (such as with the contradictory inferences from the photosynthetic rbcS and rbcL sequences14). We have proposed one approach to testing for the G+C content problem15. Here we present a formalization of this method, a frequency-dependent significance test, which has general application.

[1]  José L. Oliver,et al.  Chloroplast genes transferred to the nuclear plant genome have adjusted to nuclear base composition and codon usage , 1990, Nucleic Acids Res..

[2]  R. Crozier,et al.  The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization. , 1993, Genetics.

[3]  D. Penny,et al.  Controversy on chloroplast origins , 1992, FEBS Letters.

[4]  D. Penny,et al.  The Problem of GC Content, Evolutionary Trees and the Origins of Chl-a/b Photosynthetic Organelles: Are the Procholorophytes a Eubacterial Model for Higher Plant Photosynthesis? , 1992 .

[5]  Masami Hasegawa,et al.  Ribosomal RNA trees misleading? , 1993, Nature.

[6]  D Penny,et al.  Trees from sequences: panacea or Pandora's box. , 1990 .

[7]  James W. Archie,et al.  A randomization test for phylogenetic information in systematic data , 1989 .

[8]  D. Penny,et al.  Models for the origin of influenza viruses , 1987, Nature.

[9]  N. Murata Research in Photosynthesis , 1992 .

[10]  Daniel P. Faith,et al.  COULD A CLADOGRAM THIS SHORT HAVE ARISEN BY CHANCE ALONE?: ON PERMUTATION TESTS FOR CLADISTIC STRUCTURE , 1991 .

[11]  P. Keese,et al.  Nucleotide sequence of the genome of an Australian isolate of turnip yellow mosaic tymovirus. , 1989, Virology.

[12]  D. Penny,et al.  Influenza viruses, comets and the science of evolutionary trees. , 1989, Journal of theoretical biology.

[13]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[14]  Michael D. Hendy,et al.  Significance of the length of the shortest tree , 1992 .