Incorporating indel information into phylogeny estimation for rapidly emerging pathogens

BackgroundPhylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. To improve resolution of such phylogenies we propose using insertion and deletion (indel) information in addition to substitution information. We accomplish this through joint estimation of alignment and phylogeny in a Bayesian framework, drawing inference using Markov chain Monte Carlo. Joint estimation of alignment and phylogeny sidesteps biases that stem from conditioning on a single alignment by taking into account the ensemble of near-optimal alignments.ResultsWe introduce a novel Markov chain transition kernel that improves computational efficiency by proposing non-local topology rearrangements and by block sampling alignment and topology parameters. In addition, we extend our previous indel model to increase biological realism by placing indels preferentially on longer branches. We demonstrate the ability of indel information to increase phylogenetic resolution in examples drawn from within-host viral sequence samples. We also demonstrate the importance of taking alignment uncertainty into account when using such information. Finally, we show that codon-based substitution models can significantly affect alignment quality and phylogenetic inference by unrealistically forcing indels to begin and end between codons.ConclusionThese results indicate that indel information can improve phylogenetic resolution of recently diverged pathogens and that alignment uncertainty should be considered in such analyses.

[1]  J A Lake,et al.  The order of sequence alignment can bias the selection of tree topology. , 1991, Molecular biology and evolution.

[2]  H Kishino,et al.  Freeing phylogenies from artifacts of alignment. , 1992, Molecular biology and evolution.

[3]  J A Lake,et al.  Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. , 1992, Science.

[4]  M T Clegg,et al.  Evolution of a noncoding region of the chloroplast genome. , 1993, Molecular phylogenetics and evolution.

[5]  R DeSalle,et al.  Alignment-ambiguous nucleotide sites and the exclusion of systematic data. , 1993, Molecular phylogenetics and evolution.

[6]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[7]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8]  W C Wheeler,et al.  Elision: a method for accommodating multiple molecular sequence alignments with alignment-ambiguous sites. , 1995, Molecular phylogenetics and evolution.

[9]  K. Ikuta,et al.  In vivo dynamics of equine infectious anemia viruses emerging during febrile episodes: insertions/duplications at the principal neutralizing domain , 1997, Journal of virology.

[10]  N. Letvin,et al.  Antigenic stimulation by BCG vaccine as an in vivo driving force for SIV replication and dissemination , 1998, Nature Medicine.

[11]  R. Webster,et al.  Reassortment and Insertion-Deletion Are Strategies for the Evolution of Influenza B Viruses in Nature , 1999, Journal of Virology.

[12]  D. Schaid Mathematical and Statistical Methods for Genetic Analysis , 1999 .

[13]  F. Gao,et al.  Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes , 1999, Nature.

[14]  J. Margolick,et al.  Consistent Viral Evolutionary Changes Associated with the Progression of Human Immunodeficiency Virus Type 1 Infection , 1999, Journal of Virology.

[15]  S. Kelchner The Evolution of Non-Coding Chloroplast DNA and Its Application in Plant Systematics , 2000 .

[16]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.

[17]  C. Quinn,et al.  The evolution of the atpbeta-rbcL intergenic spacer in the epacrids (Ericales) and its systematic and evolutionary implications. , 2000, Molecular phylogenetics and evolution.

[18]  H. Hsu,et al.  Long‐term follow‐up study of core gene deletion mutants in children with chronic hepatitis B virus infection , 2000, Hepatology.

[19]  P. Wagner,et al.  Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. , 2000, Systematic biology.

[20]  P. Holland,et al.  Rare genomic changes as a tool for phylogenetics. , 2000, Trends in ecology & evolution.

[21]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[22]  R. Cheynier,et al.  Insertion/deletion frequencies match those of point mutations in the hypervariable regions of the simian immunodeficiency virus surface envelope gene. , 2001, The Journal of general virology.

[23]  Simon Whelan,et al.  A novel use of equilibrium frequencies in models of sequence evolution. , 2002, Molecular biology and evolution.

[24]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[25]  Ward C Wheeler,et al.  Iterative pass optimization of sequence data. , 2003, Cladistics : the international journal of the Willi Hennig Society.

[26]  Pär K Ingvarsson,et al.  Molecular evolution of insertions and deletion in the chloroplast genome of silene. , 2003, Molecular biology and evolution.

[27]  P. Bureš,et al.  Indel patterns of the plastid DNA trnL–trnF region within the genus Poa (Poaceae) , 2004, Journal of Plant Research.

[28]  Simon A. A. Travers,et al.  Timing and Reconstruction of the Most Recent Common Ancestor of the Subtype C Clade of Human Immunodeficiency Virus Type 1 , 2004, Journal of Virology.

[29]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[30]  Sung Keun Kang,et al.  Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. , 2004, Science.

[31]  K. Crandall,et al.  The causes and consequences of HIV evolution , 2004, Nature Reviews Genetics.

[32]  Guoping Zhao,et al.  Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China , 2004, Science.

[33]  G. Fox,et al.  Phylogenetic Analysis of Polyomavirus Simian Virus 40 from Monkeys and Humans Reveals Genetic Variation , 2004, Journal of Virology.

[34]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[35]  David Sankoff,et al.  Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA , 1976, Journal of Molecular Evolution.

[36]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[37]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[38]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[39]  K. Müller,et al.  Incorporating information from length-mutational events into phylogenetic analysis. , 2006, Molecular phylogenetics and evolution.