Small Proteins Encoded by Unannotated ORFs are Rising Stars of the Proteome, Confirming Shortcomings in Genome Annotations and Current Vision of an mRNA

Short ORF‐encoded peptides and small proteins in eukaryotes have been hiding in the shadow of large proteins for a long time. Recently, improved identifications in MS‐based proteomics and ribosome profiling resulted in the detection of large numbers of small proteins. The variety of functions of small proteins is also emerging. It seems to be the right time to reflect on why small proteins remained invisible. In addition to the obvious technical challenge of detecting small proteins, they were mostly forgotten from annotations and they escaped detection because they were not sought. In this review, we identify conventions that need to be revisited, including the assumption that mature mRNAs carry only one coding sequence. The large‐scale discovery of small proteins and of their functions will require changing some paradigms and undertaking the annotation of ORFs that are still largely perceived as irrelevant coding information compared to already annotated coding sequences.

[1]  Sachi Inagaki,et al.  Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA , 2007, Nature Cell Biology.

[2]  A. Teleman,et al.  DENR•MCT-1 Promotes Translation Reinitiation Downstream of uORFs to Control Tissue Growth , 2014, Nature.

[3]  G. Storz,et al.  The small protein floodgates are opening; now the functional analysis begins , 2014, BMC Biology.

[4]  B. Shen,et al.  A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites , 2014, Proteomics.

[5]  Y. Wolf,et al.  Small proteins can no longer be ignored. , 2014, Annual review of biochemistry.

[6]  J. Weissman,et al.  Ribosome profiling reveals the what, when, where and how of protein synthesis , 2015, Nature Reviews Molecular Cell Biology.

[7]  X. Roucou,et al.  Prion protein aggresomes are poly(A)+ ribonucleoprotein complexes that induce a PKR-mediated deficient cell stress response. , 2008, Biochimica et biophysica acta.

[8]  Lili Zhang,et al.  SmProt: a database of small proteins encoded by annotated coding and non‐coding RNA loci , 2017, Briefings Bioinform..

[9]  A. Regev,et al.  Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins , 2015, eLife.

[10]  Xiangyin Kong,et al.  Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts , 2010, Cell Research.

[11]  G. Eriani,et al.  Cap-assisted internal initiation of translation of histone H4. , 2011, Molecular cell.

[12]  John P. A. Ioannidis,et al.  Citation Metrics: A Primer on How (Not) to Normalize , 2016, PLoS biology.

[13]  X. Roucou,et al.  A large ribonucleoprotein particle induced by cytoplasmic PrP shares striking similarities with the chromatoid body, an RNA granule predicted to function in posttranscriptional gene regulation. , 2009, Biochimica et biophysica acta.

[14]  G. Edelman,et al.  Ribosomal tethering and clustering as mechanisms for translation initiation , 2006, Proceedings of the National Academy of Sciences.

[15]  Jiao Ma,et al.  Discovery of Human sORF-Encoded Polypeptides (SEPs) in Cell Lines and Tissue , 2014, Journal of proteome research.

[16]  F. huAltPrP,et al.  An overlapping reading frame in the PRNP gene encodes a novel polypeptide distinct from the prion protein , 2011 .

[17]  I. Brierley,et al.  Non-canonical translation in RNA viruses , 2012, The Journal of general virology.

[18]  S. Samrat,et al.  Alternate Reading Frame Protein (F Protein) of Hepatitis C Virus: Paradoxical Effects of Activation and Apoptosis on Human Dendritic Cells Lead to Stimulation of T Cells , 2014, PloS one.

[19]  Howard Y. Chang,et al.  Unique features of long non-coding RNA biogenesis and function , 2015, Nature Reviews Genetics.

[20]  Lennart Martens,et al.  sORFs.org: a repository of small ORFs identified by ribosome profiling , 2015, Nucleic Acids Res..

[21]  H. Bellen,et al.  Pri sORF peptides induce selective proteasome-mediated protein processing , 2015, Science.

[22]  Sebastian D. Mackowiak,et al.  Extensive identification and analysis of conserved small ORFs in animals , 2015, Genome Biology.

[23]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[24]  Song Liu,et al.  Small open reading frames: current prediction techniques and future prospect. , 2011, Current protein & peptide science.

[25]  Juan Pablo Couso,et al.  Peptides Encoded by Short ORFs Control Development and Define a New Eukaryotic Gene Family , 2007, PLoS biology.

[26]  Jiao Ma,et al.  Identification and characterization of sORF-encoded polypeptides , 2015, Critical reviews in biochemistry and molecular biology.

[27]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[28]  Emile G Magny,et al.  New Peptides Under the s(ORF)ace of the Genome. , 2016, Trends in biochemical sciences.

[29]  Stephen C. Cannon,et al.  A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle , 2016, Science.

[30]  W. Huttner,et al.  Two overlapping reading frames in a single exon encode interacting proteins—a novel way of gene usage , 2001, The EMBO journal.

[31]  V. Delcourt,et al.  Death of a dogma: eukaryotic mRNAs can code for more than one protein , 2015, Nucleic acids research.

[32]  Frances M. G. Pearl,et al.  Conserved Regulation of Cardiac Calcium Uptake by Peptides Encoded in Small Open Reading Frames , 2013, Science.

[33]  Uwe Ohler,et al.  Detecting actively translated open reading frames in ribosome profiling data , 2015, Nature Methods.

[34]  Nicholas T Ingolia,et al.  Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. , 2014, Cell reports.

[35]  S. Feller Microproteins (miPs) – the next big thing , 2012, Cell Communication and Signaling.

[36]  Audrey M. Michel,et al.  Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale , 2013, Wiley interdisciplinary reviews. RNA.

[37]  J. Couso,et al.  Extensive translation of small ORFs revealed by polysomal ribo-Seq , 2014, bioRxiv.

[38]  J. Sidney,et al.  Cellular immune response to cryptic epitopes during therapeutic gene transfer , 2009, Proceedings of the National Academy of Sciences.

[39]  Jiao Ma,et al.  Toddler: An Embryonic Signal That Promotes Cell Movement via Apelin Receptors , 2014, Science.

[40]  T. Vo‐Dinh,et al.  Monitoring intracellular proteins using fluorescence techniques: from protein synthesis and localization to activity. , 2003, Current protein & peptide science.

[41]  W. Van Criekinge,et al.  PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration , 2014, Nucleic acids research.

[42]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[43]  N. Smorodinsky,et al.  MUC1-ARF—A Novel MUC1 Protein That Resides in the Nucleus and Is Expressed by Alternate Reading Frame Translation of MUC1 mRNA , 2016, PloS one.

[44]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[45]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[46]  S Kobayashi,et al.  Small Peptides Switch the Transcriptional Activity of Shavenbaby During Drosophila Embryogenesis , 2010, Science.

[47]  P. Hoen,et al.  Alternative mRNA transcription, processing, and translation: insights from RNA sequencing , 2015 .

[48]  J. Pawlotsky,et al.  Seroconversion to hepatitis C virus alternate reading frame protein during acute infection , 2009, Hepatology.

[49]  Audrey M. Michel,et al.  Observation of dually decoded regions of the human genome using ribosome profiling data , 2012, Genome research.

[50]  M. Mann,et al.  On the extent and role of the small proteome in the parasitic eukaryote Trypanosoma brucei , 2014, BMC Biology.

[51]  Sumio Sugano,et al.  Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. , 2004, Genome research.

[52]  Jingfa Xiao,et al.  Small proteins: untapped area of potential biological importance , 2013, Front. Genet..

[53]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[54]  Miguel A. Andrade-Navarro,et al.  uORFdb—a comprehensive literature database on eukaryotic uORF biology , 2013, Nucleic Acids Res..

[55]  Manolis Kellis,et al.  Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. , 2016, Analytical chemistry.

[56]  L. Birnbaumer,et al.  XLalphas, the extra-long form of the alpha-subunit of the Gs G protein, is significantly longer than suspected, and so is its companion Alex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[58]  Y. Chern,et al.  The A2A Adenosine Receptor Is a Dual Coding Gene , 2013, The Journal of Biological Chemistry.

[59]  L. Birnbaumer,et al.  XLαs, the extra-long form of the α-subunit of the Gs G protein, is significantly longer than suspected, and so is its companion Alex , 2004 .

[60]  M. Moore From Birth to Death: The Complex Lives of Eukaryotic mRNAs , 2005, Science.

[61]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[62]  M. Albà,et al.  Long non-coding RNAs as a source of new peptides , 2014, eLife.

[63]  K. Gevaert,et al.  Deep Proteome Coverage Based on Ribosome Profiling Aids Mass Spectrometry-based Protein and Peptide Discovery and Provides Evidence of Alternative Translation Products and Near-cognate Translation Initiation Events* , 2013, Molecular & Cellular Proteomics.

[64]  Wenqian Hu,et al.  Translation of small open reading frames within unannotated RNA transcripts in Saccharomyces cerevisiae. , 2014, Cell reports.

[65]  James W. Fickett,et al.  ORFs and Genes: How Strong a Connection? , 1995, J. Comput. Biol..

[66]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[67]  A. Pauli,et al.  Decoding sORF translation – from small proteins to gene regulation , 2016, RNA biology.

[68]  J. F. Atkins,et al.  Human DNA tumor viruses generate alternative reading frame proteins through repeat sequence recoding , 2014, Proceedings of the National Academy of Sciences.

[69]  David Haussler,et al.  Current status and new features of the Consensus Coding Sequence database , 2013, Nucleic Acids Res..

[70]  Nicholas T. Ingolia,et al.  Ribosome Profiling as a Tool to Decipher Viral Complexity. , 2015, Annual review of virology.

[71]  D. Bartel,et al.  lincRNAs: Genomics, Evolution, and Mechanisms , 2013, Cell.

[72]  G. Brown,et al.  Cellular energy utilization and molecular origin of standard metabolic rate in mammals. , 1997, Physiological reviews.

[73]  Emile G Magny,et al.  Hemotin, a Regulator of Phagocytosis Encoded by a Small ORF and Conserved across Metazoans , 2016, PLoS biology.

[74]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[75]  Joshua G. Dunn,et al.  Translation from unconventional 5′ start sites drives tumour initiation , 2017, Nature.

[76]  Jing Tian,et al.  ELABELA: a hormone essential for heart development signals via the apelin receptor. , 2013, Developmental cell.

[77]  Aviv Regev,et al.  A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. , 2015, Molecular cell.

[78]  Ji Wan,et al.  TISdb: a database for alternative translation initiation in mammalian cells , 2013, Nucleic Acids Res..

[79]  Sung Mi Park,et al.  Translation initiation mediated by RNA looping , 2015, Proceedings of the National Academy of Sciences.

[80]  Patrick B. F. O'Connor,et al.  Oxygen and glucose deprivation induces widespread alterations in mRNA translation within 20 minutes , 2015, Genome Biology.

[81]  S. Wenkel,et al.  Regulation of protein function by ‘microProteins’ , 2011, EMBO reports.

[82]  Anton Nekrutenko,et al.  A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes , 2007, PLoS Comput. Biol..

[83]  François-Michel Boisvert,et al.  Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome , 2013, PloS one.

[84]  X. Roucou,et al.  Molecular morphology and toxicity of cytoplasmic prion protein aggregates in neuronal and non‐neuronal cells , 2006, Journal of neurochemistry.

[85]  M. Kozak,et al.  Pushing the limits of the scanning mechanism for initiation of translation , 2002, Gene.

[86]  F. Carlotti,et al.  Autoimmunity against a defective ribosomal insulin gene product in type 1 diabetes , 2017, Nature Medicine.

[87]  P. Barbry,et al.  Pateamine A-sensitive ribosome profiling reveals the scope of translation in mouse embryonic stem cells , 2016, BMC Genomics.

[88]  X. Roucou,et al.  Found in translation: functions and evolution of a recently discovered alternative proteome. , 2015, Current opinion in structural biology.

[89]  E. Hurt,et al.  Eukaryotic ribosome biogenesis at a glance , 2013, Journal of Cell Science.

[90]  K. Seuwen,et al.  Bioinformatics prediction of overlapping frameshifted translation products in mammalian transcripts , 2008, BMC Genomics.

[91]  Chris M. Brown,et al.  The Emerging World of Small ORFs. , 2016, Trends in plant science.

[92]  Jonathan M. Mudge,et al.  The state of play in higher eukaryote gene annotation , 2016, Nature Reviews Genetics.

[93]  Audrey M. Michel,et al.  GWIPS-viz: development of a ribo-seq genome browser , 2013, Nucleic Acids Res..

[94]  Gerben Menschaert,et al.  Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs , 2013, BMC Genomics.

[95]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[96]  Alan Saghatelian,et al.  A Human Short Open Reading Frame (sORF)-encoded Polypeptide That Stimulates DNA End Joining* , 2014, The Journal of Biological Chemistry.

[97]  John M. Shelton,et al.  A Micropeptide Encoded by a Putative Long Noncoding RNA Regulates Muscle Performance , 2015, Cell.

[98]  G. Kreiman,et al.  Quantitative Profiling of Peptides from RNAs classified as non-coding , 2014, Nature Communications.

[99]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[100]  X. Roucou,et al.  An Out-of-frame Overlapping Reading Frame in the Ataxin-1 Coding Sequence Encodes a Novel Ataxin-1 Interacting Protein* , 2013, The Journal of Biological Chemistry.

[101]  Jun Kawai,et al.  The Abundance of Short Proteins in the Mammalian Proteome , 2006, PLoS genetics.

[102]  Juan Pablo Couso,et al.  Discovery and characterization of smORF-encoded bioactive polypeptides. , 2015, Nature chemical biology.

[103]  Akinobu Matsumoto,et al.  mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide , 2016, Nature.

[104]  Patrick B. F. O'Connor,et al.  Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression , 2015, eLife.

[105]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[106]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[107]  Gerben Menschaert,et al.  Identification of Small Novel Coding Sequences, a Proteogenomics Endeavor. , 2016, Advances in experimental medicine and biology.

[108]  Jiao Ma,et al.  A human microprotein that interacts with the mRNA decapping complex , 2016, Nature chemical biology.

[109]  Ying Chen Eyre-Walker,et al.  Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq , 2014, eLife.

[110]  J. Woolford,et al.  Ribosome Biogenesis in the Yeast Saccharomyces cerevisiae , 2013, Genetics.

[111]  Gerald A Tuskan,et al.  Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. , 2011, Genome research.

[112]  Federica Monaco,et al.  West Nile alternative open reading frame (N-NS4B/WARF4) is produced in infected West Nile Virus (WNV) cells and induces humoral response in WNV infected individuals , 2012, Virology Journal.

[113]  Matthew Stephens,et al.  Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling , 2015 .

[114]  X. Zhai,et al.  Hepatitis C virus alternate reading frame protein decreases interferon-α secretion in peripheral blood mononuclear cells. , 2014, Molecular medicine reports.

[115]  D. Ojcius,et al.  Hepatitis C Virus Frameshift/Alternate Reading Frame Protein Suppresses Interferon Responses Mediated by Pattern Recognition Receptor Retinoic-Acid-Inducible Gene-I , 2016, PloS one.