Real Time Classification of Viruses in 12 Dimensions

The International Committee on Taxonomy of Viruses authorizes and organizes the taxonomic classification of viruses. Thus far, the detailed classifications for all viruses are neither complete nor free from dispute. For example, the current missing label rates in GenBank are 12.1% for family label and 30.0% for genus label. Using the proposed Natural Vector representation, all 2,044 single-segment referenced viral genomes in GenBank can be embedded in . Unlike other approaches, this allows us to determine phylogenetic relations for all viruses at any level (e.g., Baltimore class, family, subfamily, genus, and species) in real time. Additionally, the proposed graphical representation for virus phylogeny provides a visualization of the distribution of viruses in . Unlike the commonly used tree visualization methods which suffer from uniqueness and existence problems, our representation always exists and is unique. This approach is successfully used to predict and correct viral classification information, as well as to identify viral origins; e.g. a recent public health threat, the West Nile virus, is closer to the Japanese encephalitis antigenic complex based on our visualization. Based on cross-validation results, the accuracy rates of our predictions are as high as 98.2% for Baltimore class labels, 96.6% for family labels, 99.7% for subfamily labels and 97.2% for genus labels.

[1]  A. Hyatt,et al.  Genomic characterisation of Wongabel virus reveals novel genes within the Rhabdoviridae. , 2008, Virology.

[2]  John E. Johnson,et al.  Drosophila A virus is an unusual RNA virus with a T=3 icosahedral core and permuted RNA-dependent RNA polymerase. , 2009, The Journal of general virology.

[3]  D. Stallknecht,et al.  AVIAN PARAMYXOVIRUSES IN SHOREBIRDS AND GULLS , 2010, Journal of wildlife diseases.

[4]  E. Holmes What Does Virus Evolution Tell Us about Virus Origins? , 2011, Journal of Virology.

[5]  R. Wepf,et al.  Three-dimensional reconstruction of Heterocapsa circularisquama RNA virus by electron cryo-microscopy. , 2011, The Journal of general virology.

[6]  K. Korhonen,et al.  Incidence of Phlebiopsis gigantea large virus-1 in a collection of Phlebiopsis gigantea isolates , 2011, Archives of Virology.

[7]  Hong Wang,et al.  Beilong virus, a novel paramyxovirus with the largest genome of non-segmented negative-stranded RNA viruses. , 2006, Virology.

[8]  J. Hantula,et al.  A novel putative virus of Gremmeniella abietina type B (Ascomycota: Helotiaceae) has a composite genome with endornavirus affinities. , 2009, The Journal of general virology.

[9]  C. A. Thomas,et al.  Molecular cloning. , 1977, Advances in pathobiology.

[10]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[11]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[12]  Yanchun Yang,et al.  Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison , 2008, Bioinform..

[13]  Lin‐Fa Wang,et al.  Novel Paramyxoviruses in Free-Ranging European Bats , 2012, PloS one.

[14]  Judith K. Brown,et al.  Southern tomato virus: The link between the families Totiviridae and Partitiviridae. , 2009, Virus research.

[15]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[16]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[17]  E. Koonin,et al.  The Big Bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups , 2008, Nature Reviews Microbiology.

[18]  Kurt Jordaens,et al.  Multiple UPGMA and Neighbor-joining Trees and the Performance of Some Computer Packages , 1996 .

[19]  D. Baltimore Expression of animal virus genomes. , 1971, Bacteriological reviews.

[20]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[21]  C. Boye,et al.  Usutu virus in Africa. , 2011, Vector borne and zoonotic diseases.

[22]  S. Sabanadzovic,et al.  A novel monopartite dsRNA virus from rhododendron , 2010, Archives of Virology.

[23]  W. Mason,et al.  Identification and Characterization of Avihepadnaviruses Isolated from Exotic Anseriformes Maintained in Captivity , 2005, Journal of Virology.

[24]  M. Lai,et al.  Molecular cloning and sequencing of a human hepatitis delta (delta) virus RNA. , 1987, Nature.

[25]  D. Stenger,et al.  Plant-feeding insects harbor double-stranded RNA viruses encoding a novel proline-alanine rich protein and a polymerase distantly related to that of fungal viruses. , 2010, Virology.

[26]  H. Hotta,et al.  Isolation of an avirulent mutant of Sendai virus with two amino acid mutations from a highly virulent field strain through adaptation to LLC-MK2 cells. , 1997, The Journal of general virology.

[27]  E. Holmes The comparative genomics of viral emergence , 2010, Proceedings of the National Academy of Sciences.

[28]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[29]  Chenglong Yu,et al.  A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications , 2011, PloS one.

[30]  R. Tesh,et al.  Nyamanini and Midway Viruses Define a Novel Taxon of RNA Viruses in the Order Mononegavirales , 2009, Journal of Virology.

[31]  Saurabh Sinha,et al.  A statistical method for alignment-free comparison of regulatory sequences , 2007, ISMB/ECCB.

[32]  Changchuan Yin,et al.  A Novel Construction of Genome Space with Biological Geometry , 2010, DNA research : an international journal for rapid publication of reports on genes and genomes.

[33]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[34]  H. Will,et al.  A new avian hepadnavirus infecting snow geese (Anser caerulescens) produces a significant fraction of virions containing single-stranded DNA. , 1999, Virology.

[35]  H. Hamamoto,et al.  Complete nucleotide sequence of a new double-stranded RNA virus from the rice blast fungus, Magnaporthe oryzae , 2007, Archives of Virology.

[36]  G. Kurath,et al.  Complete Genome Sequence of Fer-de-Lance Virus Reveals a Novel Gene in Reptilian Paramyxoviruses , 2004, Journal of Virology.

[37]  M. V. Regenmortel,et al.  Virus taxonomy: classification and nomenclature of viruses. Seventh report of the International Committee on Taxonomy of Viruses. , 2000 .

[38]  Martin Vingron,et al.  Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts , 2012, Bioinform..