ORFik: a comprehensive R toolkit for the analysis of translation

Background With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. Results Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5′UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5′ UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. Conclusion In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. Availability http://bioconductor.org/packages/ORFik .

[1]  P. Hoen,et al.  Alternative mRNA transcription, processing, and translation: insights from RNA sequencing , 2015 .

[2]  Eivind Valen,et al.  Shoelaces: an interactive tool for ribosome profiling processing and visualization , 2018, BMC Genomics.

[3]  Steffen Heber,et al.  RiboStreamR: a web application for quality control, analysis, and visualization of Ribo-seq data , 2019, BMC Genomics.

[4]  Audrey M. Michel,et al.  RiboGalaxy: A browser based platform for the alignment, analysis and visualization of ribosome profiling data , 2016, RNA biology.

[5]  T. Preiss,et al.  Dynamics of ribosome scanning and recycling revealed by translation complex profiling , 2016, Nature.

[6]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[7]  Fabio Lauria,et al.  riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data , 2017, bioRxiv.

[8]  A. Bhatt,et al.  Structured RNA Contaminants in Bacterial Ribo-Seq , 2020, mSphere.

[9]  Eivind Valen,et al.  Profiling of Small Ribosomal Subunits Reveals Modes and Regulation of Translation Initiation. , 2020, Cell reports.

[10]  Benjamin K. Johnson,et al.  SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis , 2015, BMC Bioinformatics.

[11]  Jonathan S. Weissman,et al.  Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data , 2016, BMC Genomics.

[12]  R. Jackson,et al.  The mechanism of eukaryotic translation initiation and principles of its regulation , 2010, Nature Reviews Molecular Cell Biology.

[13]  A. Teleman,et al.  Selective 40S Footprinting Reveals Cap-Tethered Ribosome Scanning in Human Cells. , 2020, Molecular cell.

[14]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[15]  M. Kozak An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. , 1987, Nucleic acids research.

[16]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[18]  L. Romão,et al.  Gene Expression Regulation by Upstream Open Reading Frames and Human Disease , 2013, PLoS genetics.

[19]  Rachel Legendre,et al.  RiboTools: a Galaxy toolbox for qualitative ribosome profiling analysis , 2015, Bioinform..

[20]  Hajk-Georg Drost,et al.  Biomartr: genomic data retrieval with R , 2017, Bioinform..

[21]  Can Cenik,et al.  RiboFlow, RiboR and RiboPy: an ecosystem for analyzing ribosome profiling data at read length resolution , 2020, Bioinform..

[22]  Boris Lenhard,et al.  Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis , 2013, Genome research.

[23]  G. Brandi,et al.  Metformin prevents cell tumorigenesis through autophagy-related cell death , 2019, Scientific Reports.

[24]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[25]  D. Morris,et al.  Upstream Open Reading Frames as Regulators of mRNA Translation , 2000, Molecular and Cellular Biology.

[26]  Weili Wang,et al.  Riborex: fast and flexible identification of differential translation from Ribo‐seq data , 2017, Bioinform..

[27]  O. Larsson,et al.  Generally applicable transcriptome-wide analysis of translation using anota2seq , 2019, Nucleic acids research.

[28]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[29]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[30]  Shintaro Iwasaki,et al.  Transcripts from downstream alternative transcription start sites evade uORF-mediated inhibition of gene expression in Arabidopsis , 2018, Proceedings of the National Academy of Sciences.

[31]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[32]  John F Ouyang,et al.  deltaTE: Detection of Translationally Regulated Genes by Integrative Analysis of Ribo‐seq and RNA‐seq Data , 2019, Current protocols in molecular biology.

[33]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[34]  Binbin Shi,et al.  Ribosome elongating footprints denoised by wavelet transform comprehensively characterize dynamic cellular translation events , 2018, Nucleic acids research.

[35]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[36]  Alon Kahana,et al.  Natural Variability of Kozak Sequences Correlates with Function in a Zebrafish Model , 2014, PloS one.

[37]  F. Amaldi,et al.  All translation elongation factors and the e, f, and h subunits of translation initiation factor 3 are encoded by 5'-terminal oligopyrimidine (TOP) mRNAs. , 2008, RNA.

[38]  Thomas J. Hardcastle,et al.  The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for Ribo-seq data analysis , 2015, RNA.

[39]  R. Gregory,et al.  RiboToolkit: an integrated platform for analysis and annotation of ribosome profiling data to decode mRNA translation at codon resolution , 2020, Nucleic Acids Res..

[40]  Selective Translation Complex Profiling Reveals Staged Initiation and Co-translational Assembly of Initiation Factor Complexes , 2020, Molecular cell.

[41]  F. Tuorto,et al.  RiboVIEW: a computational framework for visualization, quality control and statistical analysis of ribosome profiling data , 2019, Nucleic acids research.

[42]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.