The relative performance of indel-coding methods in simulations.

We used simulations to compare the performance of 10 approaches that have been used for treating unambiguously aligned gaps in phylogenetic analyses. We examined how these approaches perform under the ideal conditions of correct alignments, as well as how robust they are to errors caused by use of inferred alignments. Our results indicate that 5th-state coding dramatically outperformed all other coding methods, which in turn all outperformed treating gaps as missing data or excluding gapped positions. Simple indel coding (SIC) and modified complex indel coding (MCIC) performed about the same, and generally outperformed the other indel-coding methods. The high performance of 5th-state coding was largely found to be a weighting artifact. We suggest that MCIC-coded gap characters be scored for all unambiguously aligned gaps in parsimony-based molecular phylogenetic analyses. When the number of terminals sampled precludes the use of MCIC, SIC may be used as an effective substitute.

[1]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[2]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[3]  Richard H. Ree,et al.  Homoplasy and Developmental Constraint: A Model and an Example from Plants1 , 2000 .

[4]  W. Wheeler,et al.  The position of arthropods in the animal kingdom: Ecdysozoa, islands, trees, and the "Parsimony ratchet". , 1999, Molecular phylogenetics and evolution.

[5]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[6]  O. Madsen,et al.  Indels in protein-coding sequences of Euarchontoglires constrain the rooting of the eutherian tree. , 2003, Molecular phylogenetics and evolution.

[7]  W. Wheeler,et al.  The Triangle Inequality and Character Analysis , 1993 .

[8]  R. Olmstead,et al.  Microstructural Changes in Noncoding Chloroplast DNA: Interpretation, Evolution, and Utility of Indels and Inversions in Basal Angiosperm Phylogenetic Inference , 2000, International Journal of Plant Sciences.

[9]  I. Olivieri,et al.  Evolution of annual species of the genus Medicago: a molecular phylogenetic approach. , 1998, Molecular phylogenetics and evolution.

[10]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[11]  K. Müller SeqState: primer design and sequence statistics for phylogenetic DNA datasets. , 2005, Applied bioinformatics.

[12]  R. Britten,et al.  Majority of divergence between closely related DNA samples is due to indels , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Simmons,et al.  The effects of increasing genetic distance on alignment of, and tree construction from, rDNA internal transcribed spacer sequences. , 2003, Molecular phylogenetics and evolution.

[14]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.

[15]  Victor A. Albert,et al.  Parsimony, phylogeny, and genomics , 2006 .

[16]  J. Dubuisson,et al.  Coding of insertion-deletion events of the chloroplastic intergene atp beta-rbcL for the phylogeny of the Valerianeae tribe (Valerianaceae). , 2002, Comptes rendus biologies.

[17]  J A Lake,et al.  The order of sequence alignment can bias the selection of tree topology. , 1991, Molecular biology and evolution.

[18]  M. Rosenberg,et al.  How should gaps be treated in parsimony? A comparison of approaches using simulation. , 2007, Molecular phylogenetics and evolution.

[19]  R. Olmstead,et al.  Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. , 2000, American journal of botany.

[20]  John Healy,et al.  GapCoder automates the use of indel characters in phylogenetic analysis , 2003, BMC Bioinformatics.

[21]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[22]  A. Meyer,et al.  Phylogeny and comparative substitution rates of frogs inferred from sequences of three nuclear genes. , 2004, Molecular biology and evolution.

[23]  Radhey S. Gupta,et al.  The branching order and phylogenetic placement of species from completed bacterial genomes, based on conserved indels found in various proteins , 2001, International microbiology : the official journal of the Spanish Society for Microbiology.

[24]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[25]  J. Dubuisson,et al.  Molecular phylogeny of the fern genus Elaphoglossum (Elaphoglossaceae) based on chloroplast non-coding DNA sequences: contributions of species from the Indian Ocean area. , 2004, Molecular phylogenetics and evolution.

[26]  David G. Lloyd,et al.  Multi‐residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses , 1991 .

[27]  A. Vogler,et al.  Phylogeny and historical biogeography of Agabinae diving beetles (Coleoptera) inferred from mitochondrial DNA sequences. , 2004, Molecular phylogenetics and evolution.

[28]  Martin C. Frith,et al.  SeqVISTA: a graphical tool for sequence feature visualization and comparison , 2003, BMC Bioinformatics.

[29]  Aaron M. Ellison,et al.  A Primer of Ecological Statistics , 2004 .

[30]  W. Brown,et al.  Hydrophobicity and phylogeny , 1995, Nature.

[31]  J. Wiens Does adding characters with missing data increase or decrease phylogenetic accuracy? , 1998, Systematic biology.

[32]  J. Farris,et al.  PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR‐JOINING , 1996, Cladistics : the international journal of the Willi Hennig Society.

[33]  W. Wheeler OPTIMIZATION ALIGNMENT: THE END OF MULTIPLE SEQUENCE ALIGNMENT IN PHYLOGENETICS? , 1996 .

[34]  P. B. Matheny Improving phylogenetic inference of mushrooms with RPB1 and RPB2 nucleotide sequences (Inocybe; Agaricales). , 2005, Molecular phylogenetics and evolution.

[35]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[36]  Jerrold I. Davis,et al.  Character‐state space versus rate of evolution in phylogenetic inference , 2004, Cladistics : the international journal of the Willi Hennig Society.

[37]  J. Braverman,et al.  Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions in new world species of the Lecythidaceae. , 2003, Molecular biology and evolution.

[38]  K. Bremer THE LIMITS OF AMINO ACID SEQUENCE DATA IN ANGIOSPERM PHYLOGENETIC RECONSTRUCTION , 1988, Evolution; international journal of organic evolution.

[39]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.

[40]  M. Simmons,et al.  Efficiently resolving the basal clades of a phylogenetic tree using Bayesian and parsimony approaches: a case study using mitogenomic data from 100 higher teleost fishes. , 2004, Molecular phylogenetics and evolution.

[41]  Peter C. Hoch,et al.  A Phylogenetic Analysis of Epilobium (Onagraceae) Based on Nuclear Ribosomal DNA Sequences , 1994 .

[42]  Xun Gu,et al.  The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment , 1995, Journal of Molecular Evolution.

[43]  Steven A Benner,et al.  Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. , 2004, Journal of molecular biology.

[44]  J. Wendel,et al.  Biogeography and floral evolution of baobabs (Adansonia, Bombacaceae) as inferred from multiple data sets. , 1998, Systematic biology.

[45]  G. Béna Molecular phylogeny supports the morphologically based taxonomic transfer of the “medicagoid”Trigonella species to the genus Medicago L. , 2001, Plant Systematics and Evolution.

[46]  Dale N. Richardson,et al.  Comprehensive comparative analysis of kinesins in photosynthetic eukaryotes , 2006, BMC Genomics.

[47]  D. Balding,et al.  Models of sequence evolution for DNA sequences containing gaps. , 2001, Molecular biology and evolution.

[48]  James F. Smith,et al.  Evolution of GCYC, a Gesneriaceae homolog of CYCLOIDEA, within Gesnerioideae (Gesneriaceae). , 2004, Molecular phylogenetics and evolution.

[49]  Ki-Joong Kim,et al.  Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. , 2004, DNA research : an international journal for rapid publication of reports on genes and genomes.

[50]  Martin S. Taylor,et al.  Occurrence and consequences of coding sequence insertions and deletions in Mammalian genomes. , 2004, Genome research.

[51]  Makoto Kato,et al.  Evolution and phylogenetic utility of alignment gaps within intron sequences of three nuclear genes in bumble bees (Bombus). , 2003, Molecular biology and evolution.

[52]  J. Palmer,et al.  Lateral transfer at the gene and subgenic levels in the evolution of eukaryotic enolase , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[53]  D. Posada,et al.  New approach to an old problem: Incorporating signal from gap-rich regions of ITS and rDNA large subunit into phylogenetic analyses to resolve the Peltigera canina species complex , 2003, Mycologia.

[54]  D. Geiger,et al.  Stretch Coding and Block Coding: Two New Strategies to Represent Questionably Aligned DNA Sequences , 2001, Journal of Molecular Evolution.

[55]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[56]  A. Vogler,et al.  Size, frequency, and phylogenetic signal of multiple‐residue indels in sequence alignment of introns , 2006 .

[57]  Kevin C. Nixon,et al.  The limits of conventional cladistic analysis , 2006 .

[58]  Lars Vogt Weighting indels as phylogenetic markers of 18S rDNA sequences in Diptera and Strepsiptera , 2002 .

[59]  P. Wagner,et al.  Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. , 2000, Systematic biology.

[60]  W. Maddison Missing Data versus Missing Characters in Phylogenetic Analysis , 1993 .

[61]  Mark P. Simmons,et al.  Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. , 2001, Systematic biology.

[62]  K. Müller,et al.  Incorporating information from length-mutational events into phylogenetic analysis. , 2006, Molecular phylogenetics and evolution.

[63]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[64]  C. Webb,et al.  Quantification of the success of phylogenetic inference in simulations , 2006 .

[65]  A. Kluge,et al.  Taxonomic congruence versus total evidence, and amniote phylogeny inferred from fossils, molecules, and morphology. , 1993, Molecular biology and evolution.

[66]  Ryan E. Mills,et al.  An initial map of insertion and deletion (INDEL) variation in the human genome. , 2006, Genome research.

[67]  Pär K Ingvarsson,et al.  Molecular evolution of insertions and deletion in the chloroplast genome of silene. , 2003, Molecular biology and evolution.

[68]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[69]  H Kishino,et al.  Freeing phylogenies from artifacts of alignment. , 1992, Molecular biology and evolution.

[70]  Hervé Philippe,et al.  The potential value of indels as phylogenetic markers: position of trichomonads as a case study. , 2002, Molecular biology and evolution.

[71]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[72]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[73]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[74]  Mark W. Chase,et al.  Analysis of Mitochondrial nad1b-c Intron Sequences in Orchidaceae: Utility and Coding of Length-change Characters , 2009 .