PROTEOFORMER 2.0: Further Developments in the Ribosome Profiling-assisted Proteogenomic Hunt for New Proteoforms*

The PROTEOFORMER pipeline feeds ribosome profiling-driven information into an MS/MS search space. The pipeline has been greatly expanded and updated since its first publication. These novelties are presented and validated with matching MS/MS data, leading to the endorsement of a set of new proteoforms on MS/MS level and to a collection of general considerations for the ribosome profiling-based proteogenomics community. Graphical Abstract Highlights PROTEOFORMER adds ribosome profiling information to MS/MS search spaces. The PROTEOFORMER pipeline is greatly expanded and updated since its first publication. New features are demonstrated with matching ribosome profiling and MS/MS data. Experiments lead to MS/MS-proven proteoforms and general proteogenomic notices. PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5′ and 3′ extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.

[1]  Anna M. McGeachy,et al.  The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments , 2012, Nature Protocols.

[2]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[3]  Joshua B. Plotkin,et al.  riboviz: analysis and visualization of ribosome profiling datasets , 2017, BMC Bioinformatics.

[4]  G. Kryukov,et al.  New Mammalian Selenocysteine-containing Proteins Identified with an Algorithm That Searches for Selenocysteine Insertion Sequence Elements* , 1999, The Journal of Biological Chemistry.

[5]  J. Rinn,et al.  Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs , 2013, Development.

[6]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[7]  Junhao Hu,et al.  Selenocysteine-Specific Mass Spectrometry Reveals Tissue-Distinct Selenoproteomes and Candidate Selenoproteins. , 2018, Cell chemical biology.

[8]  Sang Y. Chun,et al.  SPECtre: a spectral coherence-based classifier of actively translated transcripts from ribosome profiling sequence data , 2015, BMC Bioinformatics.

[9]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[10]  Jeffrey A. Hussmann,et al.  Understanding Biases in Ribosome Profiling Experiments Reveals Signatures of Translation Dynamics in Yeast , 2015, bioRxiv.

[11]  Hokeun Kim,et al.  Compact variant‐rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses , 2014, Proteomics.

[12]  Michael J MacCoss,et al.  A Deeper Look into Comet—Implementation and Features , 2015, Journal of The American Society for Mass Spectrometry.

[13]  John R Yates,et al.  Central limit theorem as an approximation for intensity-based scoring function. , 2006, Analytical chemistry.

[14]  Zhi Xie,et al.  RPFdb v2.0: an updated database for genome-wide information of translated mRNA generated from ribosome profiling , 2018, Nucleic Acids Res..

[15]  Georgia Drakakaki,et al.  Beyond Glycolysis: GAPDHs Are Multi-functional Enzymes Involved in Regulation of ROS, Autophagy, and Plant Immune Responses , 2015, PLoS genetics.

[16]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[17]  John R Yates,et al.  One-step affinity purification of the yeast ribosome and its associated proteins and mRNAs. , 2002, RNA.

[18]  Jonathan S. Weissman,et al.  Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data , 2016, BMC Genomics.

[19]  Gerben Menschaert,et al.  proBAMconvert: A Conversion Tool for proBAM/proBed. , 2017, Journal of proteome research.

[20]  Harald Barsnes,et al.  SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. , 2018, Journal of proteome research.

[21]  Zhe Ji,et al.  RibORF: Identifying Genome‐Wide Translated Open Reading Frames Using Ribosome Profiling , 2018, Current protocols in molecular biology.

[22]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[23]  Eric W. Deutsch,et al.  File Formats Commonly Used in Mass Spectrometry Proteomics* , 2012, Molecular & Cellular Proteomics.

[24]  Vadim N. Gladyshev,et al.  Translation inhibitors cause abnormalities in ribosome profiling experiments , 2014, Nucleic acids research.

[25]  K. Gevaert,et al.  Positional proteomics reveals differences in N‐terminal proteoform stability , 2016, Molecular systems biology.

[26]  David L Tabb,et al.  MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence. , 2005, Analytical chemistry.

[27]  Uwe Ohler,et al.  Detecting actively translated open reading frames in ribosome profiling data , 2015, Nature Methods.

[28]  William Stafford Noble,et al.  Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0 , 2016, Journal of The American Society for Mass Spectrometry.

[29]  K. Gevaert,et al.  Deep Proteome Coverage Based on Ribosome Profiling Aids Mass Spectrometry-based Protein and Peptide Discovery and Provides Evidence of Alternative Translation Products and Near-cognate Translation Initiation Events* , 2013, Molecular & Cellular Proteomics.

[30]  Qing-Yu He,et al.  Resolving chromosome-centric human proteome with translating mRNA analysis: a strategic demonstration. , 2014, Journal of proteome research.

[31]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[32]  B. Shen,et al.  A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites , 2014, Proteomics.

[33]  W. Van Criekinge,et al.  N-terminal Proteomics and Ribosome Profiling Provide a Comprehensive View of the Alternative Translation Initiation Landscape in Mice and Men* , 2014, Molecular & Cellular Proteomics.

[34]  Ralf Zimmer,et al.  Improved Ribo-seq enables identification of cryptic translation events , 2018, Nature Methods.

[35]  Luis Mendoza,et al.  Flexible and Fast Mapping of Peptides to a Proteome with ProteoMapper. , 2018, Journal of proteome research.

[36]  Tamir Tuller,et al.  Estimation of ribosome profiling performance and reproducibility at various levels of resolution , 2016, Biology Direct.

[37]  Alexander Bartholomäus,et al.  Mapping the non-standardized biases of ribosome profiling , 2016, Biological chemistry.

[38]  Aviv Regev,et al.  A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. , 2015, Molecular cell.

[39]  Nicholas T Ingolia,et al.  Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. , 2014, Cell reports.

[40]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[41]  Lennart Martens,et al.  MS2PIP: a tool for MS/MS peak intensity prediction , 2013, Bioinform..

[42]  Y. Guan,et al.  Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence , 2014, Proteomics.

[43]  Juan Antonio Vizcaíno,et al.  The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data , 2017, Genome Biology.

[44]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[45]  P. Willems,et al.  N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana , 2017, Molecular & Cellular Proteomics.

[46]  D. Fenyö,et al.  Proteogenomics from a bioinformatics angle: A growing field. , 2015, Mass spectrometry reviews.

[47]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[48]  Nicholas T Ingolia,et al.  Transcriptome-wide measurement of translation by ribosome profiling. , 2017, Methods.

[49]  Audrey M. Michel,et al.  GWIPS-viz: 2018 update , 2017, Nucleic Acids Res..

[50]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[51]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[52]  Pavel V. Baranov,et al.  Comparative survey of the relative impact of mRNA features on local ribosome profiling read density , 2015, Nature Communications.

[53]  V. Gladyshev,et al.  Selenoproteins: molecular pathways and physiological roles. , 2014, Physiological reviews.

[54]  Thomas J. Hardcastle,et al.  The use of duplex-specific nuclease in ribosome profiling and a user-friendly software package for Ribo-seq data analysis , 2015, RNA.

[55]  Gerben Menschaert,et al.  mQC: A post-mapping data exploration tool for ribosome profiling , 2019, Comput. Methods Programs Biomed..

[56]  K. Huse,et al.  Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting , 2012, Genome research.

[57]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[58]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[59]  W. Van Criekinge,et al.  PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration , 2014, Nucleic acids research.

[60]  Andrew R Jones,et al.  phpMs: A PHP-Based Mass Spectrometry Utilities Library. , 2018, Journal of proteome research.

[61]  Nicholas T. Ingolia,et al.  Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins , 2013, Cell.

[62]  Vadim N. Gladyshev,et al.  Ribonuclease selection for ribosome profiling , 2016, Nucleic acids research.