Dinucleotide Composition in Animal RNA Viruses Is Shaped More by Virus Family than by Host Species

ABSTRACT Viruses use the cellular machinery of their hosts for replication. It has therefore been proposed that the nucleotide and dinucleotide compositions of viruses should match those of their host species. If this is upheld, it may then be possible to use dinucleotide composition to predict the true host species of viruses sampled in metagenomic surveys. However, it is also clear that different taxonomic groups of viruses tend to have distinctive patterns of dinucleotide composition that may be independent of host species. To determine the relative strength of the effect of host versus virus family in shaping dinucleotide composition, we performed a comparative analysis of 20 RNA virus families from 15 host groupings, spanning two animal phyla and more than 900 virus species. In particular, we determined the odds ratios for the 16 possible dinucleotides and performed a discriminant analysis to evaluate the capability of virus dinucleotide composition to predict the correct virus family or host taxon from which it was isolated. Notably, while 81% of the data analyzed here were predicted to the correct virus family, only 62% of these data were predicted to their correct subphylum/class host and a mere 32% to their correct mammalian order. Similarly, dinucleotide composition has a weak predictive power for different hosts within individual virus families. We therefore conclude that dinucleotide composition is generally uniform within a virus family but less well reflects that of its host species. This has obvious implications for attempts to accurately predict host species from virus genome sequences alone. IMPORTANCE Determining the processes that shape virus genomes is central to understanding virus evolution and emergence. One question of particular importance is why nucleotide and dinucleotide frequencies differ so markedly between viruses. In particular, it is currently unclear whether host species or virus family has the biggest impact on dinucleotide frequencies and whether dinucleotide composition can be used to accurately predict host species. Using a comparative analysis, we show that dinucleotide composition has a strong phylogenetic association across different RNA virus families, such that dinucleotide composition can predict the family from which a virus sequence has been isolated. Conversely, dinucleotide composition has a poorer predictive power for the different host species within a virus family and across different virus families, indicating that the host has a relatively small impact on the dinucleotide composition of a virus genome.

[1]  M. Davenport,et al.  Source of CpG Depletion in the HIV-1 Genome. , 2016, Molecular biology and evolution.

[2]  D. Gladue,et al.  Selective Factors Associated with the Evolution of Codon Usage in Natural Populations of Arboviruses , 2016, PloS one.

[3]  P. Klenerman,et al.  Elevation of CpG frequencies in influenza A genome attenuates pathogenicity but enhances host response to infection , 2016, eLife.

[4]  Nikolaus Osterrieder,et al.  Codon Pair Bias Is a Direct Consequence of Dinucleotide Bias. , 2016, Cell reports.

[5]  Nikos Vasilakis,et al.  Divergent Viruses Discovered in Arthropods and Vertebrates Revise the Evolutionary History of the Flaviviridae and Related Viruses , 2015, Journal of Virology.

[6]  Julie L. Chaney,et al.  Roles for Synonymous Codon Usage in Protein Biogenesis. , 2015, Annual review of biophysics.

[7]  Charles B. Ward,et al.  Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference , 2015, Proceedings of the National Academy of Sciences.

[8]  B. Berkhout,et al.  On the biased nucleotide composition of the human coronavirus RNA genome , 2015 .

[9]  M. Shi,et al.  Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses , 2015, eLife.

[10]  P. Simmonds,et al.  The influence of CpG and UpA dinucleotide frequencies on RNA virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication , 2014, Nucleic acids research.

[11]  R. Hardy,et al.  Insect antiviral innate immunity: pathways, effectors, and connections. , 2013, Journal of molecular biology.

[12]  S. Vasaikar,et al.  CpG Dinucleotide Frequencies Reveal the Role of Host Methylation Capabilities in Parvovirus Evolution , 2013, Journal of Virology.

[13]  Wei Chen,et al.  CpG Usage in RNA Viruses: Data and Hypotheses , 2013, PloS one.

[14]  H. Musto,et al.  A detailed comparative analysis on the overall codon usage patterns in West Nile virus. , 2013, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[15]  L. Stipkovits,et al.  The analysis of codon bias of foot-and-mouth disease virus and the adaptation of this virus to the hosts. , 2013, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[16]  I. Belalov,et al.  Causes and Implications of Codon Usage Bias in RNA Viruses , 2013, PloS one.

[17]  E. Holmes,et al.  Random Codon Re-encoding Induces Stable Reduction of Replicative Fitness of Chikungunya Virus in Primate and Mosquito Cells , 2013, PLoS pathogens.

[18]  MingKun Li,et al.  The tendency to recreate ancestral CG dinucleotides in the human genome , 2011, BMC Evolutionary Biology.

[19]  Raul Rabadan,et al.  Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus , 2010, BMC Evolutionary Biology.

[20]  P. Simmonds,et al.  Use of Nucleotide Composition Analysis To Infer Hosts for Three Novel Picorna-Like Viruses , 2010, Journal of Virology.

[21]  Vishvanath Nene,et al.  Faculty Opinions recommendation of Live attenuated influenza virus vaccines by computer-aided rational design. , 2010 .

[22]  Woei-Chyn Chu,et al.  Categorizing Host-Dependent RNA Viruses by Principal Component Analysis of Their Codon Usage Preferences , 2009, J. Comput. Biol..

[23]  Michal Linial,et al.  Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences , 2009, Molecular systems biology.

[24]  Andreas Tauch,et al.  Virus-Host Coevolution: Common Patterns of Nucleotide Motif Usage in Flaviviridae and Their Hosts , 2009, PloS one.

[25]  R. Rabadán,et al.  Patterns of Oligonucleotide Sequences in Viral and Host Cell RNA Identify Mediators of the Host Innate Immune System , 2009, PloS one.

[26]  T. Davies,et al.  Phylogeny and geography predict pathogen community similarity in wild primates and humans , 2008, Proceedings of the Royal Society B: Biological Sciences.

[27]  J. R. Coleman,et al.  Virus Attenuation by Genome-Scale Changes in Codon Pair Bias , 2008, Science.

[28]  Gyan Bhanot,et al.  Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses , 2008, PLoS pathogens.

[29]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[30]  Susanna K.P. Lau,et al.  Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses , 2007, Virology.

[31]  Chi-Yao Chang,et al.  Analysis of codon usage bias and base compositional constraints in iridovirus genomes. , 2007, Virus research.

[32]  J. R. Lobry,et al.  SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis , 2007 .

[33]  Tao Pan,et al.  Tissue-Specific Differences in Human Transfer RNA Expression , 2006, PLoS genetics.

[34]  Steven Skiena,et al.  Reduction of the Rate of Poliovirus Protein Synthesis through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by Lowering Specific Infectivity , 2006, Journal of Virology.

[35]  Raul Rabadan,et al.  Comparison of Avian and Human Influenza A Viruses Reveals a Mutational Bias on the Viral Genomes , 2006, Journal of Virology.

[36]  S. Akira,et al.  Pathogen Recognition and Innate Immunity , 2006, Cell.

[37]  M. Frank-Kamenetskii,et al.  Base-stacking and base-pairing contributions into thermal stability of the DNA double helix , 2006, Nucleic acids research.

[38]  E. Holmes,et al.  Evolutionary Basis of Codon Usage and Nucleotide Composition Bias in Vertebrate DNA Viruses , 2006, Journal of Molecular Evolution.

[39]  J. Rask-Madsen,et al.  Expression of Toll‐like receptor 9 and response to bacterial CpG oligodeoxynucleotides in human intestinal epithelium * , 2005, Clinical and experimental immunology.

[40]  Ruth Nussinov,et al.  Strong doublet preferences in nucleotide sequences and DNA geometry , 2005, Journal of Molecular Evolution.

[41]  Yong Wang,et al.  DNA structure constraint is probably a fundamental factor inducing CpG deficiency in bacteria , 2004, Bioinform..

[42]  B. Berkhout,et al.  Genome structure and transcriptional regulation of human coronavirus NL63 , 2004, Virology Journal.

[43]  Kamel Jabbari,et al.  Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. , 2004, Gene.

[44]  Xiao Sun,et al.  Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales , 2004, Virus Research.

[45]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[46]  Edward C Holmes,et al.  The extent of codon usage bias in human RNA viruses and its evolutionary origin. , 2003, Virus research.

[47]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[48]  N. Mcferran,et al.  Dinucleotide and stop codon frequencies in single-stranded RNA viruses. , 1997, The Journal of general virology.

[49]  S Karlin,et al.  Compositional differences within and between eukaryotic genomes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[50]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[51]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[52]  S Karlin,et al.  Heterogeneity of genomes: measures and values. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[53]  S Karlin,et al.  Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? , 1994, Journal of virology.

[54]  A. Travers,et al.  Spurring on transcription? , 1993, Current Biology.

[55]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[56]  F. Wright The 'effective number of codons' used in a gene. , 1990, Gene.

[57]  J A Koziol,et al.  Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[58]  A. Bird CpG-rich islands and the function of DNA methylation , 1986, Nature.