How to Illuminate the Dark Proteome Using the Multi‐omic OpenProt Resource

Ten of thousands of open reading frames (ORFs) are hidden within genomes. These alternative ORFs, or small ORFs, have eluded annotations because they are either small or within unsuspected locations. They are found in untranslated regions or overlap a known coding sequence in messenger RNA and anywhere in a “non‐coding” RNA. Serendipitous discoveries have highlighted these ORFs’ importance in biological functions and pathways. With their discovery came the need for deeper ORF annotation and large‐scale mining of public repositories to gather supporting experimental evidence. OpenProt, accessible at https://openprot.org/, is the first proteogenomic resource enforcing a polycistronic model of annotation across an exhaustive transcriptome for 10 species. Moreover, OpenProt reports experimental evidence cumulated across a re‐analysis of 114 mass spectrometry and 87 ribosome profiling datasets. The multi‐omics OpenProt resource also includes the identification of predicted functional domains and evaluation of conservation for all predicted ORFs. The OpenProt web server provides two query interfaces and one genome browser. The query interfaces allow for exploration of the coding potential of genes or transcripts of interest as well as custom downloads of all information contained in OpenProt. © 2020 The Authors.

[1]  Aïda Ouangraoua,et al.  OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes , 2018, Nucleic Acids Res..

[2]  Lili Zhang,et al.  SmProt: a database of small proteins encoded by annotated coding and non‐coding RNA loci , 2017, Briefings Bioinform..

[3]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[4]  Nicholas T. Ingolia Ribosome profiling: new views of translation, from single codons to genome scale , 2014, Nature Reviews Genetics.

[5]  Chris M. Brown,et al.  The Emerging World of Small ORFs. , 2016, Trends in plant science.

[6]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[7]  Yan Wang,et al.  RPFdb: a database for genome wide information of translated mRNA generated from ribosome profiling , 2015, Nucleic Acids Res..

[8]  Ralf Zimmer,et al.  Improved Ribo-seq enables identification of cryptic translation events , 2018, Nature Methods.

[9]  Gerben Menschaert,et al.  Identification of Small Novel Coding Sequences, a Proteogenomics Endeavor. , 2016, Advances in experimental medicine and biology.

[10]  Patrick B. F. O'Connor,et al.  Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression , 2015, eLife.

[11]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[12]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[13]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[14]  Michelle S. Scott,et al.  Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins , 2017, eLife.

[15]  Yang I Li,et al.  Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling , 2015, bioRxiv.

[16]  M. Mann,et al.  Pervasive functional translation of noncanonical human open reading frames , 2020, Science.

[17]  Patrick B. F. O'Connor,et al.  Oxygen and glucose deprivation induces widespread alterations in mRNA translation within 20 minutes , 2015, Genome Biology.

[18]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[19]  François-Michel Boisvert,et al.  Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome , 2013, PloS one.

[20]  M. Brunet,et al.  Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship , 2018, Genome research.

[21]  Yasset Perez-Riverol,et al.  The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics , 2019, Nucleic Acids Res..

[22]  Song Liu,et al.  Small open reading frames: current prediction techniques and future prospect. , 2011, Current protein & peptide science.

[23]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[24]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[25]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[26]  K. Gevaert,et al.  Deep Proteome Coverage Based on Ribosome Profiling Aids Mass Spectrometry-based Protein and Peptide Discovery and Provides Evidence of Alternative Translation Products and Near-cognate Translation Initiation Events* , 2013, Molecular & Cellular Proteomics.

[27]  Gerben Menschaert,et al.  An update on sORFs.org: a repository of small ORFs identified by ribosome profiling , 2017, Nucleic Acids Res..

[28]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[29]  Nicholas T. Ingolia Ribosome Footprint Profiling of Translation throughout the Genome , 2016, Cell.

[30]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[31]  Thomas L. Madden,et al.  Applications of network BLAST server. , 1996, Methods in enzymology.

[32]  J. Weissman,et al.  Ribosome profiling reveals the what, when, where and how of protein synthesis , 2015, Nature Reviews Molecular Cell Biology.

[33]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[34]  May D. Wang,et al.  Assessing the impact of human genome annotation choice on RNA-seq expression estimates , 2013, BMC Bioinformatics.

[35]  Jiao Ma,et al.  Discovery of Human sORF-Encoded Polypeptides (SEPs) in Cell Lines and Tissue , 2014, Journal of proteome research.

[36]  G. Storz,et al.  Alternative ORFs and small ORFs: shedding light on the dark proteome. , 2019, Nucleic acids research.