The evolution of genome compression and genomic novelty in RNA viruses.

The genomes of RNA viruses are characterized by their extremely small size and extremely high mutation rates (typically 10 kb and 10(-4)/base/replication cycle, respectively), traits that are thought to be causally linked. One aspect of their small size is the genome compression caused by the use of overlapping genes (where some nucleotides code for two genes). Using a comparative analysis of all known RNA viral species, we show that viruses with larger genomes tend to have less gene overlap. We provide a numerical model to show how a high mutation rate could lead to gene overlap, and we discuss the factors that might explain the observed relationship between gene overlap and genome size. We also propose a model for the evolution of gene overlap based on the co-opting of previously unused ORFs, which gives rise to two types of overlap: (1) the creation of novel genes inside older genes, predominantly via +1 frameshifts, and (2) the incremental increase in overlap between originally contiguous genes, with no frameshift preference. Both types of overlap are viewed as the creation of genomic novelty under pressure for genome compression. Simulations based on our model generate the empirical size distributions of overlaps and explain the observed frameshift preferences. We suggest that RNA viruses are a good model system for the investigation of general evolutionary relationship between genome attributes such as mutational robustness, mutation rate, and size.

[1]  E. Holmes,et al.  A reevaluation of the higher taxonomy of viruses based on RNA polymerases , 1996, Journal of virology.

[2]  Rafael Sanjuán,et al.  Mechanisms of genetic robustness in RNA viruses , 2006, EMBO reports.

[3]  P. Keese,et al.  Origins of genes: "big bang" or continuous creation? , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. B. S. Haldane,et al.  The Effect of Variation of Fitness , 1937, The American Naturalist.

[5]  John C. W. Shepherd,et al.  Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code , 2005, Journal of Molecular Evolution.

[6]  M. Kimura,et al.  The mutational load with epistatic gene interactions in fitness. , 1966, Genetics.

[7]  D. Krakauer,et al.  Redundancy, antiredundancy, and the robustness of genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  H. Varmus,et al.  Transduction of a cellular oncogene: the genesis of Rous sarcoma virus. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Rafael Sanjuán,et al.  Epistasis correlates to genomic complexity , 2006, Proceedings of the National Academy of Sciences.

[10]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[11]  Ricard V Solé,et al.  The Fittest versus the Flattest: Experimental Confirmation of the Quasispecies Effect with Subviral Pathogens , 2006, PLoS pathogens.

[12]  A. Pavesi,et al.  On the Informational Content of Overlapping Genes in Prokaryotic and Eukaryotic Viruses , 1997, Journal of Molecular Evolution.

[13]  E. Domingo,et al.  Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase. , 1992, Gene.

[14]  C. Cameron,et al.  RNA virus error catastrophe: Direct molecular test by using ribavirin , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Hervé Seligmann,et al.  The ambush hypothesis: hidden stop codons prevent off-frame gene reading. , 2004, DNA and cell biology.

[16]  S. Chisholm,et al.  Properties of overlapping genes are conserved across microbial genomes. , 2004, Genome research.

[17]  Rafael Sanjuán,et al.  The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  C. Ofria,et al.  Evolution of digital organisms at high mutation rates leads to survival of the flattest , 2001, Nature.

[19]  M. Wade,et al.  Alternative definitions of epistasis: dependence and interaction , 2001 .

[20]  Chris M. Brown,et al.  Detecting overlapping coding sequences with pairwise alignments , 2005, Bioinform..

[21]  C. Wilke,et al.  Evolution of mutational robustness. , 2003, Mutation research.

[22]  T. Jukes On the prevalence of certain codons (“RNY”) in genes for proteins , 1996, Journal of Molecular Evolution.

[23]  O. Tenaillon,et al.  Evolution of Mutational Robustness in an RNA Virus , 2005, PLoS biology.

[24]  F. Brinkman,et al.  Phylogenetic analysis. , 1998, Methods of biochemical analysis.

[25]  R. Kurth,et al.  Identification of a Rev-related protein by analysis of spliced transcripts of the human endogenous retroviruses HTDV/HERV-K , 1995, Journal of virology.

[26]  J. Drake,et al.  Mutation rates among RNA viruses. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  T. Grundström,et al.  Overlapping genes. , 1983, Annual review of genetics.

[28]  D. Garcin,et al.  The Versatility of Paramyxovirus RNA Polymerase Stuttering , 1999, Journal of Virology.

[29]  Andy Gardner,et al.  Recombination and the evolution of mutational robustness. , 2006, Journal of theoretical biology.

[30]  David C. Krakauer,et al.  STABILITY AND EVOLUTION OF OVERLAPPING GENES , 2000, Evolution; international journal of organic evolution.

[31]  Edward C Holmes,et al.  High rate of viral evolution associated with the emergence of carnivore parvovirus. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Edward C Holmes,et al.  Error thresholds and the constraints to RNA virus evolution , 2003, Trends in Microbiology.

[33]  R. Lenski,et al.  Test of synergistic interactions among deleterious mutations in bacteria , 1997, Nature.

[34]  M. Eigen,et al.  What is a quasispecies? , 2006, Current topics in microbiology and immunology.

[35]  Kazuho Ikeo,et al.  Constrained evolution with respect to gene overlap of hepatitis B virus , 2009, Journal of Molecular Evolution.

[36]  L. Brakier-Gingras,et al.  Translation of the F protein of hepatitis C virus is initiated at a non-AUG codon in a +1 reading frame relative to the polyprotein , 2005, Nucleic acids research.

[37]  Edward C. Holmes,et al.  Rates of Molecular Evolution in RNA Viruses: A Quantitative Phylogenetic Analysis , 2002, Journal of Molecular Evolution.

[38]  Chris M. Brown,et al.  Detecting overlapping coding sequences in virus genomes , 2006, BMC Bioinformatics.

[39]  J. Drake,et al.  Rates of spontaneous mutation. , 1998, Genetics.

[40]  Eugene V Koonin,et al.  Purifying and directional selection in overlapping prokaryotic genes. , 2002, Trends in genetics : TIG.

[41]  C. Burch,et al.  Patterns of epistasis in RNA viruses: a review of the evidence from vaccine design , 2003, Journal of evolutionary biology.

[42]  M. Tomita,et al.  Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. , 1999, Nucleic acids research.

[43]  Silvia Vásquez,et al.  Hepatitis C virus F protein sequence reveals a lack of functional constraints and a variable pattern of amino acid substitution. , 2005, The Journal of general virology.

[44]  M. Eigen Selforganization of matter and the evolution of biological macromolecules , 1971, Naturwissenschaften.

[45]  Valery Kirzhner,et al.  Overlapping Messages and Survivability , 2004, Journal of Molecular Evolution.

[46]  Edward C Holmes,et al.  Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses. , 2003, The Journal of general virology.

[47]  L. Mansky In Vivo Analysis of Human T-Cell Leukemia Virus Type 1 Reverse Transcription Accuracy , 2000, Journal of Virology.