Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa.

The reconstruction of phylogenetic history is predicated on being able to accurately establish hypotheses of character homology, which involves sequence alignment for studies based on molecular sequence data. In an empirical study investigating nucleotide sequence alignment, we inferred phylogenetic trees for 43 species of the Apicomplexa and 3 of Dinozoa based on complete small-subunit rDNA sequences, using six different multiple-alignment procedures: manual alignment based on the secondary structure of the 18S rRNA molecule, and automated similarity-based alignment algorithms using the PileUp, ClustalW, TreeAlign, MALIGN, and SAM computer programs. Trees were constructed using neighboring-joining, weighted-parsimony, and maximum-likelihood methods. All of the multiple sequence alignment procedures yielded the same basic structure for the estimate of the phylogenetic relationship among the taxa, which presumably represents the underlying phylogenetic signal. However, the placement of many of the taxa was sensitive to the alignment procedure used; and the different alignments produced trees that were on average more dissimilar from each other than did the different tree-building methods used. The multiple alignments from the different procedures varied greatly in length, but aligned sequence length was not a good predictor of the similarity of the resulting phylogenetic trees. We also systematically varied the gap weights (the relative cost of inserting a new gap into a sequence or extending an already-existing gap) for the ClustalW program, and this produced alignments that were at least as different from each other as those produced by the different alignment algorithms. Furthermore, there was no combination of gap weights that produced the same tree as that from the structure alignment, in spite of the fact that many of the alignments were similar in length to the structure alignment. We also investigated the phylogenetic information content of the helical and nonhelical regions of the rDNA, and conclude that the helical regions are the most informative. We therefore conclude that many of the literature disagreements concerning the phylogeny of the Apicomplexa are probably based on differences in sequence alignment strategies rather than differences in data or tree-building methods.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  K. Kjer,et al.  Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. , 1995, Molecular phylogenetics and evolution.

[3]  D. Ord,et al.  PAUP:Phylogenetic analysis using parsi-mony , 1993 .

[4]  J. Cracraft,et al.  Parsimony and Phylogenetic Inference Using DNA Sequences : Some Methodological Strategies , 2022 .

[5]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[6]  Steffen Schulze-Kremer,et al.  Molecular Bioinformatics: Algorithms and Applications , 1995 .

[7]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[8]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[9]  Mário C. C. Pinna CONCEPTS AND TESTS OF HOMOLOGY IN THE CLADISTIC PARADIGM , 1991 .

[10]  P. Stevens,et al.  Homology and Phylogeny: Morphology and Systematics , 1984 .

[11]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[12]  J. Barta,et al.  Phylogenetic analysis of the class Sporozoea (phylum Apicomplexa Levine, 1970): evidence for the independent evolution of heteroxenous life cycles. , 1989, The Journal of parasitology.

[13]  M. Siddall,et al.  Molecular phylogenetic evidence that the phylum Haplosporidia has an alveolate ancestry. , 1995, Molecular biology and evolution.

[14]  Yves Van de Peer,et al.  Database on the structure of small ribosomal subunit RNA , 1998, Nucleic Acids Res..

[15]  R. Gutell,et al.  Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. , 1994, Microbiological reviews.

[16]  J. Hein,et al.  A tree reconstruction method that is economical in the number of pairwise comparisons used. , 1989, Molecular biology and evolution.

[17]  B. Dalrymple,et al.  Ribosomal DNA sequence comparison of Babesia and Theileria. , 1992, Molecular and biochemical parasitology.

[18]  Masato Ishikawa,et al.  Comprehensive study on iterative algorithms of multiple sequence alignment , 1995, Comput. Appl. Biosci..

[19]  A. Smith RNA SEQUENCE DATA IN PHYLOGENETIC RECONSTRUCTION: TESTING THE LIMITS OF ITS RESOLUTION , 1989, Cladistics : the international journal of the Willi Hennig Society.

[20]  J Hein,et al.  An algorithm combining DNA and protein alignment. , 1994, Journal of theoretical biology.

[21]  S. Muse Evolutionary analyses of DNA sequences subject to constraints of secondary structure. , 1995, Genetics.

[22]  J. Corliss An interim utilitarian [user-friendly] hierarchical classification and characterization of the protists , 1994 .

[23]  S. Barker,et al.  Phylogenetic position of the genus Perkinsus (Protista, Apicomplexa) based on small subunit ribosomal RNA. , 1993, Molecular and biochemical parasitology.

[24]  F. Cox,et al.  The evolutionary expansion of the Sporozoa. , 1994, International journal for parasitology.

[25]  R DeSalle,et al.  Alignment-ambiguous nucleotide sites and the exclusion of systematic data. , 1993, Molecular phylogenetics and evolution.

[26]  R F Doolittle,et al.  Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. , 1996, Methods in enzymology.

[27]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[28]  H Kishino,et al.  Freeing phylogenies from artifacts of alignment. , 1992, Molecular biology and evolution.

[29]  I. Rinsma-Melchert The expected number of matches in optimal global sequence alignments , 1993 .

[30]  R De Wachter,et al.  DCSE, an interactive tool for sequence alignment and secondary structure research. , 1993, Computer applications in the biosciences : CABIOS.

[31]  D. Penny,et al.  Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. , 1996, Molecular biology and evolution.

[32]  F. Ayala,et al.  Evolutionary origin of Plasmodium and other Apicomplexa based on rRNA genes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[33]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[34]  G. Olsen Phylogenetic analysis using ribosomal RNA. , 1988, Methods in enzymology.

[35]  M. Sleigh Protozoa and other protists , 1989 .

[36]  W. Wheeler,et al.  MALIGN: A Multiple Sequence Alignment Program , 1994 .

[37]  Ward C. Wheeler,et al.  SEQUENCE ALIGNMENT, PARAMETER SENSITIVITY, AND THE PHYLOGENETIC ANALYSIS OF MOLECULAR DATA , 1995 .

[38]  A. Johnson,et al.  Phylogenetic relationships of Babesia divergens as determined from comparison of small subunit ribosomal RNA gene sequences. , 1994, Molecular and biochemical parasitology.

[39]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[40]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[41]  G. Olsen,et al.  Ribosomal RNA: a key to phylogeny , 1993, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[42]  E J Milner-White,et al.  Mix'n'Match: an improved multiple sequence alignment procedure for distantly related proteins using secondary structure predictions, designed to be independent of the choice of gap penalty and scoring matrix. , 1993, Protein engineering.

[43]  N Takezaki,et al.  Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. , 1996, Molecular biology and evolution.

[44]  Rupert De Wachter,et al.  DCSE, an interactive tool for sequence alignment and secondary structure research , 1993, Comput. Appl. Biosci..

[45]  J. Barta,et al.  Evolutionary relationships of avian Eimeria species among other Apicomplexan protozoa: monophyly of the apicomplexa is supported. , 1991, Molecular biology and evolution.

[46]  D. Hillis,et al.  Ribosomal DNA: Molecular Evolution and Phylogenetic Inference , 1991, The Quarterly Review of Biology.

[47]  W. Wheeler,et al.  Paired sequence difference in ribosomal RNAs: evolutionary and phylogenetic implications. , 1988, Molecular biology and evolution.

[48]  J. Wolters The troublesome parasites--molecular and morphological evidence that Apicomplexa belong to the dinoflagellate-ciliate clade. , 1991, Bio Systems.

[49]  D. Morrison Phylogenetic tree-building. , 1996, International journal for parasitology.

[50]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[51]  J. Baker 2 – Systematics of Parasitic Protozoa , 1977 .

[52]  S. Beverley,et al.  Evolution of nuclear ribosomal RNAs in kinetoplastid protozoa: perspectives on the age and origins of parasitism. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[53]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[54]  M. Sogin,et al.  Ribosomal RNA sequences of Sarcocystis muris, Theileria annulata and Crypthecodinium cohnii reveal evolutionary relationships among apicomplexans, dinoflagellates, and ciliates. , 1991, Molecular and biochemical parasitology.

[55]  J Hein,et al.  A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. , 1989, Molecular biology and evolution.

[56]  W R Taylor,et al.  Multiple protein sequence alignment: algorithms and gap insertion. , 1996, Methods in enzymology.

[57]  M. P. Cummings,et al.  Sampling properties of DNA sequence data in phylogenetic analysis. , 1995, Molecular biology and evolution.

[58]  R. Doolittle Molecular evolution: computer analysis of protein and nucleic acid sequences. , 1990, Methods in enzymology.

[59]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[60]  A. Rodrigo,et al.  Inadequate Support for an Evolutionary Link between the Metazoa and the Fungi , 1994 .

[61]  M. Miyamoto,et al.  Testing phylogenetic approaches with empirical data, as illustrated with the parsimony method. , 1992, Molecular biology and evolution.

[62]  W. Brown,et al.  Rates and patterns of base change in the small subunit ribosomal RNA gene. , 1993, Genetics.

[63]  M. A. McClure,et al.  Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[64]  T. Cavalier-smith,et al.  Kingdom protozoa and its 18 phyla. , 1993, Microbiological reviews.

[65]  A. Kluge,et al.  Taxonomic congruence versus total evidence, and amniote phylogeny inferred from fossils, molecules, and morphology. , 1993, Molecular biology and evolution.

[66]  M. Schlegel,et al.  Protist evolution and phylogeny as discerned from small subunit ribosomal RNA sequence comparisons. , 1991, European journal of protistology.

[67]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[68]  J. Hein Unified approach to alignment and phylogenies. , 1990, Methods in enzymology.

[69]  Yves Van de Peer,et al.  Database on the structure of large ribosomal subunit RNA , 1994, Nucleic Acids Res..

[70]  D. Hillis,et al.  Ribosomal RNA secondary structure: compensatory mutations and implications for phylogenetic analysis. , 1993, Molecular biology and evolution.

[71]  Yves Van de Peer,et al.  TREECON: a software package for the construction and drawing of evolutionary trees , 1993, Comput. Appl. Biosci..

[72]  C. M. Henneke,et al.  A multiple sequence alignment algorithm for homologous proteins using secondary structure information and optionally keying alignments to functionally important sites , 1989, Comput. Appl. Biosci..

[73]  J A Lake,et al.  The order of sequence alignment can bias the selection of tree topology. , 1991, Molecular biology and evolution.

[74]  David M. Williams,et al.  A NOTE OF MOLECULAR HOMOLOGY: MULTIPLE PATTERNS FROM SINGLE DATASETS , 1993, Cladistics : the international journal of the Willi Hennig Society.

[75]  H. Tyson Relationships between amino acid sequences determined through optimum alignments, clustering, and specific distance patterns: application to a group of scorpion toxins. , 1992, Genome.