REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes

Abstract Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.

[1]  Lennart Martens,et al.  sORFs.org: a repository of small ORFs identified by ribosome profiling , 2015, Nucleic Acids Res..

[2]  P. Markham,et al.  The Effect of an Alternate Start Codon on Heterologous Expression of a PhoA Fusion Protein in Mycoplasma gallisepticum , 2015, PloS one.

[3]  Julia L. Blanchard,et al.  Fishing for Space: Fine-Scale Multi-Sector Maritime Activities Influence Fisher Location Choice , 2015, PloS one.

[4]  M. Stumpf,et al.  Overlapping genes: a window on gene evolvability , 2014, BMC Genomics.

[5]  Jonghwan Baek,et al.  Identification of Unannotated Small Genes in Salmonella , 2017, G3: Genes, Genomes, Genetics.

[6]  Rachel Green,et al.  High-precision analysis of translational pausing by ribosome profiling in bacteria lacking EFP. , 2015, Cell reports.

[8]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[9]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[10]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[11]  Thomas J. Hardcastle,et al.  The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for Ribo-seq data analysis , 2015, RNA.

[12]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[13]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[14]  T. D. Schneider,et al.  Small membrane proteins found by comparative genomics and ribosome binding site models , 2008, Molecular microbiology.

[15]  T. D. Schneider,et al.  Anatomy of Escherichia coli ribosome binding sites. , 2001, Journal of molecular biology.

[16]  W. Van Criekinge,et al.  PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration , 2014, Nucleic acids research.

[17]  C. Felser,et al.  Negative magnetoresistance without well-defined chirality in the Weyl semimetal TaP , 2015, Nature Communications.

[18]  Vivien Marx,et al.  The Author File: Hasan DeMirci , 2015, Nature Methods.

[19]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[20]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[21]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[22]  Emmanuelle Lerat,et al.  Recognizing the pseudogenes in bacterial genomes , 2005, Nucleic acids research.

[23]  B. Noon,et al.  Using SiZer to detect thresholds in ecological data , 2009 .

[24]  Mick Watson,et al.  The automatic annotation of bacterial genomes , 2012, Briefings Bioinform..

[25]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[26]  Pavel V. Baranov,et al.  Comparative survey of the relative impact of mRNA features on local ribosome profiling read density , 2015, Nature Communications.

[27]  Aviv Regev,et al.  A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. , 2015, Molecular cell.

[28]  Tamir Tuller,et al.  Estimation of ribosome profiling performance and reproducibility at various levels of resolution , 2016, Biology Direct.

[29]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[30]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[31]  M. Grunberg‐Manago,et al.  AUU-to-AUG mutation in the initiator codon of the translation initiation factor IF3 abolishes translational autocontrol of its own gene (infC) in vivo. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Kevin Karplus,et al.  Identification of prokaryotic small proteins using a comparative genomic approach , 2011, Bioinform..

[33]  T. Arnesen,et al.  NatF Contributes to an Evolutionary Shift in Protein N-Terminal Acetylation and Is Important for Normal Chromosome Segregation , 2011, PLoS genetics.

[34]  Ljupco Kocarev,et al.  Computational Methods for Gene Finding in Prokaryotes , 2010 .

[35]  Thierry Meinnel,et al.  The Proteomics of N-terminal Methionine Cleavage*S , 2006, Molecular & Cellular Proteomics.

[36]  Steven Salzberg,et al.  A probabilistic method for identifying start codons in bacterial genomes , 2001, Bioinform..

[37]  B. Maček,et al.  Deep Coverage of the Escherichia coli Proteome Enables the Assessment of False Discovery Rates in Simple Proteogenomic Experiments* , 2013, Molecular & Cellular Proteomics.

[38]  C. Chiu,et al.  Pseudogene recoding revealed from proteomic analysis of salmonella serovars. , 2012, Journal of proteome research.

[39]  Heidi J. Sofia,et al.  Analysis of the Escherichia coli genome. V. DNA sequence of the region from 76.0 to 81.5 minutes , 1993, Nucleic Acids Res..

[40]  K. Gevaert,et al.  The proteome under translational control , 2014, Proteomics.

[41]  Yan Zhang,et al.  An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes , 2005, Bioinform..

[42]  R. W. Lutz,et al.  Statistical model to estimate a threshold dose and its confidence limits for the analysis of sublinear dose-response relationships, exemplified for mutagenicity data. , 2009, Mutation research.

[43]  Audrey M. Michel,et al.  RiboGalaxy: A browser based platform for the alignment, analysis and visualization of ribosome profiling data , 2016, RNA biology.

[44]  O. Poch,et al.  Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors? , 2007, Genome Biology.

[45]  Uwe Ohler,et al.  Detecting actively translated open reading frames in ribosome profiling data , 2015, Nature Methods.

[46]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[47]  Audrey M. Michel,et al.  Observation of dually decoded regions of the human genome using ribosome profiling data , 2012, Genome research.

[48]  A. Schier,et al.  Identifying (non‐)coding RNAs and small peptides: Challenges and opportunities , 2015, BioEssays : news and reviews in molecular, cellular and developmental biology.

[49]  I. Goodhead,et al.  Taking the pseudo out of pseudogenes. , 2015, Current opinion in microbiology.

[50]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[51]  David H Burkhardt,et al.  Quantifying Absolute Protein Synthesis Rates Reveals Principles Underlying Allocation of Cellular Resources , 2014, Cell.

[52]  W. Van Criekinge,et al.  N-terminal Proteomics and Ribosome Profiling Provide a Comprehensive View of the Alternative Translation Initiation Landscape in Mice and Men* , 2014, Molecular & Cellular Proteomics.

[53]  Rachel Green,et al.  Clarifying the Translational Pausing Landscape in Bacteria by Ribosome Profiling. , 2016, Cell reports.

[54]  J. Rinn,et al.  Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs , 2013, Development.

[55]  P. Palenchar Amino Acid Biases in the N- and C-termini of Proteins are Evolutionarily Conserved and are Conserved Between Functionally Related Proteins , 2008, The protein journal.

[56]  F. Blattner,et al.  Analysis of the Escherichia coli genome. III. DNA sequence of the region from 87.2 to 89.2 minutes. , 1993, Nucleic acids research.

[57]  Melissa J. Moore,et al.  Redefining the Translational Status of 80S Monosomes , 2016, Cell.