PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration

An increasing amount of studies integrate mRNA sequencing data into MS-based proteomics to complement the translation product search space. However, several factors, including extensive regulation of mRNA translation and the need for three- or six-frame-translation, impede the use of mRNA-seq data for the construction of a protein sequence search database. With that in mind, we developed the PROTEOFORMER tool that automatically processes data of the recently developed ribosome profiling method (sequencing of ribosome-protected mRNA fragments), resulting in genome-wide visualization of ribosome occupancy. Our tool also includes a translation initiation site calling algorithm allowing the delineation of the open reading frames (ORFs) of all translation products. A complete protein synthesis-based sequence database can thus be compiled for mass spectrometry-based identification. This approach increases the overall protein identification rates with 3% and 11% (improved and new identifications) for human and mouse, respectively, and enables proteome-wide detection of 5′-extended proteoforms, upstream ORF translation and near-cognate translation start sites. The PROTEOFORMER tool is available as a stand-alone pipeline and has been implemented in the galaxy framework for ease of use.

[1]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[2]  B. Shen,et al.  A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites , 2014, Proteomics.

[3]  Nicholas T Ingolia,et al.  Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. , 2014, Cell reports.

[4]  Ying Chen Eyre-Walker,et al.  Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq , 2014, eLife.

[5]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[6]  W. Van Criekinge,et al.  N-terminal Proteomics and Ribosome Profiling Provide a Comprehensive View of the Alternative Translation Initiation Landscape in Mice and Men* , 2014, Molecular & Cellular Proteomics.

[7]  J. Couso,et al.  Extensive translation of small ORFs revealed by polysomal ribo-Seq , 2014, bioRxiv.

[8]  Michael R. Shortreed,et al.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. , 2014, Journal of proteome research.

[9]  Desmond G. Higgins,et al.  GWIPS-viz: development of a ribo-seq genome browser , 2013, Nucleic Acids Res..

[10]  Lennart Martens,et al.  MS2PIP: a tool for MS/MS peak intensity prediction , 2013, Bioinform..

[11]  S. V. Heesch,et al.  University of Groningen Quantitative and Qualitative Proteome Characteristics Extracted from In-Depth Integrated Genomics and Proteomics Analysis , 2018 .

[12]  Joshua G. Dunn,et al.  Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster , 2013, eLife.

[13]  Gerben Menschaert,et al.  Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs , 2013, BMC Genomics.

[14]  Nicholas T. Ingolia,et al.  Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins , 2013, Cell.

[15]  Audrey M. Michel,et al.  Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale , 2013, Wiley interdisciplinary reviews. RNA.

[16]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[17]  K. Gevaert,et al.  Deep Proteome Coverage Based on Ribosome Profiling Aids Mass Spectrometry-based Protein and Peptide Discovery and Provides Evidence of Alternative Translation Products and Near-cognate Translation Initiation Events* , 2013, Molecular & Cellular Proteomics.

[18]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[19]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[20]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[21]  Marco Y. Hein,et al.  Decoding Human Cytomegalovirus , 2012, Science.

[22]  S. Hubbard,et al.  Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies , 2012, Journal of proteome research.

[23]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[24]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[25]  A. Nesvizhskii,et al.  Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. , 2012, Journal of proteome research.

[26]  J. Rinn,et al.  Modular regulatory principles of large non-coding RNAs , 2012, Nature.

[27]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[28]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[29]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[30]  Martin Kircher,et al.  Deep proteome and transcriptome mapping of a human cancer cell line , 2011, Molecular systems biology.

[31]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[32]  K. Gevaert,et al.  Selecting protein N-terminal peptides by combined fractional diagonal chromatography , 2011, Nature Protocols.

[33]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[34]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[35]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[36]  Lennart Martens,et al.  compomics-utilities: an open-source Java library for computational proteomics , 2011, BMC Bioinformatics.

[37]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[38]  Nicholas T. Ingolia,et al.  Mammalian microRNAs predominantly act to decrease target mRNA levels , 2010, Nature.

[39]  K. Gevaert,et al.  A review of COFRADIC techniques targeting protein N-terminal acetylation , 2009, BMC proceedings.

[40]  Lennart Martens,et al.  PRIDE Converter: making proteomics data-sharing easy , 2009, Nature Biotechnology.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  K. Gevaert,et al.  Improved recovery of proteome‐informative, protein N‐terminal peptides by combined fractional diagonal chromatography (COFRADIC) , 2008, Proteomics.

[43]  A. Hinnebusch,et al.  New modes of translational control in development, behavior, and disease. , 2007, Molecular cell.

[44]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[45]  M. Washburn,et al.  Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors , 2006, Proceedings of the National Academy of Sciences.

[46]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[47]  M. Mann,et al.  Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein*S , 2005, Molecular & Cellular Proteomics.

[48]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[49]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[50]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[51]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[52]  M. Haine,et al.  Van Damme A. , 1986 .