Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames.

Genome annotation is central to today's proteomic research as it draws the outlines of the proteomic landscape. Traditional models of open reading frame (ORF) annotation impose two arbitrary criteria: a minimum length of 100 codons and a single ORF per transcript. However, a growing number of studies report expression of proteins from allegedly non-coding regions, challenging the accuracy of current genome annotations. These novel proteins were found encoded either within non-coding RNAs, 5' or 3' untranslated regions (UTRs) of mRNAs, or overlapping a known coding sequence (CDS) in an alternative ORF. OpenProt is the first database that enforces a polycistronic model for eukaryotic genomes, allowing annotation of multiple ORFs per transcript. OpenProt is freely accessible and offers custom downloads of protein sequences across 10 species. Using OpenProt database for proteomic experiments enables novel proteins discovery and highlights the polycistronic nature of eukaryotic genes. The size of OpenProt database (all predicted proteins) is substantial and need be taken in account for the analysis. However, with appropriate false discovery rate (FDR) settings or the use of a restricted OpenProt database, users will gain a more realistic view of the proteomic landscape. Overall, OpenProt is a freely available tool that will foster proteomic discoveries.

[1]  Aparna Bhaduri,et al.  Quantitative analysis of mammalian translation initiation sites by FACS-seq , 2014, Molecular systems biology.

[2]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[3]  Gerben Menschaert,et al.  In Search of Lost Small Peptides. , 2017, Annual review of cell and developmental biology.

[4]  Juan Pablo Couso,et al.  Discovery and characterization of smORF-encoded bioactive polypeptides. , 2015, Nature chemical biology.

[5]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[6]  J. Tavernier,et al.  Intelligent Mixing of Proteomes for Elimination of False Positives in Affinity Purification-Mass Spectrometry. , 2016, Journal of proteome research.

[7]  Knut Reinert,et al.  OpenMS and TOPP: open source software for LC-MS data analysis. , 2011, Methods in molecular biology.

[8]  Ravali Adusumilli,et al.  Data Conversion with ProteoWizard msConvert. , 2017, Methods in molecular biology.

[9]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[10]  Lennart Martens,et al.  Quality control in mass spectrometry-based proteomics. , 2018, Mass spectrometry reviews.

[11]  Jeffrey S. Morris,et al.  The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. , 2005, Briefings in functional genomics & proteomics.

[12]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[13]  May D. Wang,et al.  Assessing the impact of human genome annotation choice on RNA-seq expression estimates , 2013, BMC Bioinformatics.

[14]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[15]  M. Kozak,et al.  Pushing the limits of the scanning mechanism for initiation of translation , 2002, Gene.

[16]  Xavier Roucou,et al.  Small Proteins Encoded by Unannotated ORFs are Rising Stars of the Proteome, Confirming Shortcomings in Genome Annotations and Current Vision of an mRNA , 2018, Proteomics.

[17]  David L. Tabb,et al.  Wavelet-Based Peak Detection and a New Charge Inference Procedure for MS/MS Implemented in ProteoWizard’s msConvert , 2014, Journal of proteome research.

[18]  Aïda Ouangraoua,et al.  OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes , 2018, Nucleic Acids Res..

[19]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[20]  Eric W. Deutsch,et al.  Combining Results of Multiple Search Engines in Proteomics* , 2013, Molecular & Cellular Proteomics.

[21]  Marco Y. Hein,et al.  A Human Interactome in Three Quantitative Dimensions Organized by Stoichiometries and Abundances , 2015, Cell.

[22]  Amber L. Couzens,et al.  The CRAPome: a Contaminant Repository for Affinity Purification Mass Spectrometry Data , 2013, Nature Methods.

[23]  Rainer Breitling,et al.  msCompare: A Framework for Quantitative Analysis of Label-free LC-MS Data for Comparative Candidate Biomarker Studies* , 2012, Molecular & Cellular Proteomics.

[24]  Kei-Hoi Cheung,et al.  X!!Tandem, an improved method for running X!tandem in parallel on collections of commodity computers. , 2008, Journal of proteome research.

[25]  Patrick B. F. O'Connor,et al.  Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression , 2015, eLife.

[26]  Michelle S. Scott,et al.  Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins , 2017, eLife.

[27]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[28]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[29]  Hanliu Wang Mass Spectrometry and Protein Analysis: Structure, Oligomerization and Interaction , 2016 .

[30]  Michael R Brent,et al.  Genome annotation past, present, and future: how to define an ORF at each locus. , 2005, Genome research.

[31]  R. Flavell,et al.  The Translation of Non-Canonical Open Reading Frames Controls Mucosal Immunity , 2018, Nature.

[32]  Amit Kumar Yadav,et al.  Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data. , 2017, Methods in molecular biology.

[33]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[34]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[35]  Knut Reinert,et al.  OpenMS - A platform for reproducible analysis of mass spectrometry data. , 2017, Journal of biotechnology.

[36]  M. Brunet,et al.  Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship , 2018, Genome research.

[37]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[38]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[39]  S. Kain,et al.  Growth of wildtype and mutant E. coli strains in minimal media for optimal production of nucleic acids for preparing labeled nucleotides , 2010, Applied Microbiology and Biotechnology.

[40]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[41]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[42]  Ellen T. Gelfand,et al.  A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project , 2015, Biopreservation and biobanking.

[43]  Edward L. Huttlin,et al.  The BioPlex Network: A Systematic Exploration of the Human Interactome , 2015, Cell.