Searching for virus phylotypes

Motivation: Large phylogenies are being built today to study virus evolution, trace the origin of epidemics, establish the mode of transmission and survey the appearance of drug resistance. However, no tool is available to quickly inspect these phylogenies and combine them with extrinsic traits (e.g. geographic location, risk group, presence of a given resistance mutation), seeking to extract strain groups of specific interest or requiring surveillance. Results: We propose a new method for obtaining such groups, which we call phylotypes, from a phylogeny having taxa (strains) annotated with extrinsic traits. Phylotypes are subsets of taxa with close phylogenetic relationships and common trait values. The method combines ancestral trait reconstruction using parsimony, with combinatorial and numerical criteria measuring tree shape characteristics and the diversity and separation of the potential phylotypes. A shuffling procedure is used to assess the statistical significance of phylotypes. All algorithms have linear time complexity. This results in low computing times, typically a few minutes for the larger data sets with a number of shuffling steps. Two HIV-1 data sets are analyzed, one of which is large, containing >3000 strains of HIV-1 subtype C collected worldwide, where the method shows its ability to recover known clusters and transmission routes, and to detect new ones. Availability: This method and companion tools are implemented in an interactive Web interface (www.phylotype.org), which provides a wide choice of graphical views and output formats, and allows for exploratory analyses of large data sets. Contact: francois.chevenet@ird.fr, gascuel@lirmm.fr Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Marco Salemi,et al.  High-resolution phylogenetics and phylogeography of human immunodeficiency virus type 1 subtype C epidemic in South America. , 2011, The Journal of general virology.

[2]  Martine Peeters,et al.  Unprecedented Degree of Human Immunodeficiency Virus Type 1 (HIV-1) Group M Genetic Diversity in the Democratic Republic of Congo Suggests that the HIV-1 Pandemic Originated in Central Africa , 2000, Journal of Virology.

[3]  Tulio de Oliveira,et al.  High-Resolution Molecular Epidemiology and Evolutionary History of HIV-1 Subtypes in Albania , 2008, PloS one.

[4]  Stéphane Hué,et al.  Genetic analysis reveals the complex structure of HIV-1 transmission within defined risk groups. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[6]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[7]  R. Lathrop,et al.  A statistical phylogeography of influenza A H5N1 , 2007, Proceedings of the National Academy of Sciences.

[8]  Olivier Gascuel,et al.  The Origin and Evolutionary History of HIV-1 Subtype C in Senegal , 2012, PloS one.

[9]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[10]  Tatiana A. Tatusova,et al.  Visualization of large influenza virus sequence datasets using adaptively aggregated trees with sampling-based subscale representation , 2008, BMC Bioinformatics.

[11]  Björn Canbäck,et al.  RAMI: a tool for identification and characterization of phylogenetic clusters in microbial communities , 2009, Bioinform..

[12]  Richard H. Ree,et al.  Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. , 2008, Systematic biology.

[13]  Phalguni Gupta,et al.  Origin and Dynamics of HIV-1 Subtype C Infection in India , 2011, PloS one.

[14]  P. Ghys,et al.  Global trends in molecular epidemiology of HIV-1 during 2000–2007 , 2011, AIDS.

[15]  Marco Salemi,et al.  HIV-1 phylogenetic analysis shows HIV-1 transits through the meninges to brain and peripheral tissues. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[16]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[17]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[18]  References , 1971 .

[19]  Campbell O. Webb,et al.  Picante: R tools for integrating phylogenies and ecology , 2010, Bioinform..

[20]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[21]  Jonathan P. Bollback,et al.  SIMMAP: Stochastic character mapping of discrete traits on phylogenies , 2006, BMC Bioinformatics.

[22]  David L. Swofford,et al.  Reconstructing ancestral character states under Wagner parsimony , 1987 .

[23]  M Slatkin,et al.  A cladistic measure of gene flow inferred from the phylogenies of alleles. , 1989, Genetics.

[24]  L. Wain,et al.  Chimpanzee Reservoirs of Pandemic and Nonpandemic HIV-1 , 2006, Science.

[25]  David Dunn,et al.  Demonstration of Sustained Drug-Resistant Human Immunodeficiency Virus Type 1 Lineages Circulating among Treatment-Naïve Individuals , 2009, Journal of Virology.

[26]  Fredrik Ronquist,et al.  Dispersal-Vicariance Analysis: A New Approach to the Quantification of Historical Biogeography , 1997 .

[27]  Tulio de Oliveira,et al.  The HIV-1 Subtype C Epidemic in South America Is Linked to the United Kingdom , 2010, PloS one.

[28]  Ujjwal Neogi,et al.  Molecular Epidemiology of HIV-1 Subtypes in India: Origin and Evolutionary History of the Predominant Subtype C , 2012, PloS one.

[29]  Aurora Fernández-García,et al.  Phylogenetic structure in African HIV-1 subtype C revealed by selective sequential pruning. , 2011, Virology.

[30]  T. F. Rinke de Wit,et al.  Identification of a genetic subcluster of HIV type 1 subtype C (C') widespread in Ethiopia. , 2000, AIDS research and human retroviruses.

[31]  Sergei L. Kosakovsky Pond,et al.  An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1 , 2009, PLoS Comput. Biol..

[32]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.