Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model

BackgroundThe vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes.ResultsIn this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV).ConclusionsThe present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.

[1]  M. Ragan,et al.  Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? , 2007, Systematic biology.

[2]  C. Li,et al.  A complexity-based measure and its application to phylogenetic analysis , 2008, Journal of mathematical chemistry.

[3]  Ka Hou Chu,et al.  Rapid DNA barcoding analysis of large datasets using the composition vector method , 2009, BMC Bioinformatics.

[4]  Martin J Blaser,et al.  Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses , 2006, BMC Genomics.

[5]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[6]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[7]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[8]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[9]  CHU-WEN YANG,et al.  Evaluation of Experimental Designs for Two-Color cDNA Microarrays , 2005, J. Comput. Biol..

[10]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[11]  N. Suzuki Virus Taxonomy : Seventh Report of the International Committee for the Taxonomy of Viruses.(共著) , 2000 .

[12]  J. Qi,et al.  Whole genome molecular phylogeny of large dsDNA viruses using composition vector method , 2007, BMC Evolutionary Biology.

[13]  Sung-Hou Kim,et al.  Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method , 2009, Proceedings of the National Academy of Sciences.

[14]  J. Maniloff,et al.  Virus taxonomy : eighth report of the International Committee on Taxonomy of Viruses , 2005 .

[15]  GaoLei,et al.  Molecular phylogeny of coronaviruses including human SARS-CoV , 2003 .

[16]  Brandon S. Gaut,et al.  Extensive gene gain associated with adaptive evolution of poxviruses , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  G. Siegl The Parvoviruses , 1976, Virology Monographs / Die Virusforschung in Einzeldarstellungen.

[18]  C. Hutchison,et al.  Gene content phylogeny of herpesviruses. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  G. Darai,et al.  Iridovirus Homologues of Cellular Genes--Implications for the Molecular Evolution of Large DNA Viruses , 2004, Virus Genes.

[20]  D. Rock,et al.  Genome of Deerpox Virus , 2005, Journal of Virology.

[21]  E. Herniou,et al.  Baculovirus phylogeny and evolution. , 2007, Current drug targets.

[22]  A. Meng,et al.  Somite-specific expression of a novel fibronectin variant FN3 is negatively regulated by SHH , 2002 .

[23]  G. Stuart,et al.  A whole genome perspective on the phylogeny of the plant virus family Tombusviridae , 2004, Archives of Virology.

[24]  A. Brunt,et al.  Virus Taxonomy. Seventh Report of the International Committee on Taxomony of Viruses , 1999 .

[25]  K. Chu,et al.  Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment , 2005, Journal of Molecular Evolution.

[26]  R. Edwards,et al.  The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage , 2002, Journal of bacteriology.

[27]  E. Herniou,et al.  Ancient Coevolution of Baculoviruses and Their Insect Hosts , 2004, Journal of Virology.

[28]  M. S. Chapman,et al.  Structure, sequence, and function correlations among parvoviruses. , 1993, Virology.

[29]  Zhao Xu,et al.  A fungal phylogeny based on 82 complete genomes using the composition vector method , 2009, BMC Evolutionary Biology.

[30]  David C. Krakauer,et al.  Complete Genome Viral Phylogenies Suggests the Concerted Evolution of Regulatory Cores and Accessory Satellites , 2008, PLoS ONE.

[31]  P. Forterre,et al.  The role played by viruses in the evolution of their hosts: a view based on informational protein phylogenies. , 2003, Research in microbiology.

[32]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[33]  E. Herniou,et al.  Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny , 2001, Journal of Virology.

[34]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[35]  Alain Guénoche,et al.  Comparison of alignment free string distances for complete genome phylogeny , 2009, Adv. Data Anal. Classif..

[36]  P. Tijssen CRC handbook of parvoviruses , 1989 .

[37]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[38]  E. Holmes,et al.  The evolution of large DNA viruses: combining genomic information of viruses and their hosts. , 2004, Trends in microbiology.

[39]  J. Kerr,et al.  Short regions of sequence identity between the genomes of human and rodent parvoviruses and their respective hosts occur within host genes for the cytoskeleton, cell adhesion and Wnt signalling. , 2006, The Journal of general virology.

[40]  Chung-Kang Peng,et al.  Genomic Classification Using an Information-Based Similarity Index: Application to the SARS Coronavirus , 2005, J. Comput. Biol..

[41]  A. Hughes Origin and Evolution of Viral Interleukin-10 and Other DNA Virus Genes with Vertebrate Homologues , 2002, Journal of Molecular Evolution.

[42]  Jiamin Zhang,et al.  Comparative analysis of the three-dimensional structure of Periplaneta fuliginosa densovirus , 2003 .

[43]  A. Gorbalenya,et al.  Topley and Wilson's Microbiology and Microbial Infections , 2005 .

[44]  B. Harrach,et al.  Genomic and phylogenetic analyses of an adenovirus isolated from a corn snake (Elaphe guttata) imply a common origin with members of the proposed new genus Atadenovirus. , 2002, The Journal of general virology.

[45]  Max Sussman,et al.  Topley and Wilson's Microbiology and Microbial infections , 1998 .

[46]  P. Zanotto,et al.  Phylogenetic interrelationships among baculoviruses: evolutionary rates and host associations. , 1993, Journal of invertebrate pathology.

[47]  E. Herniou,et al.  Whole genome analysis of the Epiphyas postvittana nucleopolyhedrovirus. , 2002, The Journal of general virology.

[48]  Se-Ran Jun,et al.  Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution , 2009, Proceedings of the National Academy of Sciences.

[49]  R. L. Harrison,et al.  Comparative analysis of the genomes of Rachiplusia ou and Autographa californica multiple nucleopolyhedroviruses. , 2003, The Journal of general virology.

[50]  Zu-Guo Yu,et al.  Proper Distance Metrics for Phylogenetic Analysis Using Complete Genomes without Sequence Alignment , 2010, International journal of molecular sciences.

[51]  E. Holmes,et al.  Rates of evolutionary change in viruses: patterns and determinants , 2008, Nature Reviews Genetics.

[52]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[53]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[54]  Ka Hou Chu,et al.  Ribosomal RNA as molecular barcodes: a simple correlation analysis without sequence alignment , 2006, Bioinform..