Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.

[1]  F. Zindy,et al.  Alternative reading frames of the INK4a tumor suppressor gene encode two unrelated proteins capable of inducing cell cycle arrest , 1995, Cell.

[2]  A. M. van der Bliek,et al.  A Human Dynamin-related Protein Controls the Distribution of Mitochondria , 1998, The Journal of cell biology.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  A G Leslie,et al.  Molecular architecture of the rotary motor in ATP synthase. , 1999, Science.

[5]  A. Sytkowski,et al.  Novel Interaction between the Transcription Factor CHOP (GADD153) and the Ribosomal Protein FTE/S3a Modulates Erythropoiesis* , 2000, The Journal of Biological Chemistry.

[6]  C. Pabo,et al.  DNA recognition by Cys2His2 zinc finger proteins. , 2000, Annual review of biophysics and biomolecular structure.

[7]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[8]  P. Wright,et al.  Zinc finger proteins: new insights into structural and functional diversity. , 2001, Current opinion in structural biology.

[9]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[10]  F. Urano,et al.  Inhibition of CHOP translation by a peptide encoded by an open reading frame localized in the chop 5'UTR. , 2001, Nucleic acids research.

[11]  M. Kozak,et al.  Pushing the limits of the scanning mechanism for initiation of translation , 2002, Gene.

[12]  Ulrike Mende,et al.  Dilated Cardiomyopathy and Heart Failure Caused by a Mutation in Phospholamban , 2003, Science.

[13]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[14]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[15]  L. Birnbaumer,et al.  XLalphas, the extra-long form of the alpha-subunit of the Gs G protein, is significantly longer than suspected, and so is its companion Alex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  L. Birnbaumer,et al.  XLαs, the extra-long form of the α-subunit of the Gs G protein, is significantly longer than suspected, and so is its companion Alex , 2004 .

[17]  Matthias W. Hentze,et al.  Hepcidin Regulates Cellular Iron Efflux by Binding to Ferroportin and Inducing Its Internalization , 2004 .

[18]  B. Frenkel,et al.  Leaky ribosomal scanning in mammalian genomes: significance of histone H4 alternative translation in vivo , 2005, Nucleic acids research.

[19]  Graziano Pesole,et al.  uAUG and uORFs in human and rodent 5'untranslated mRNAs. , 2005, Gene.

[20]  F. Cordelières,et al.  A guided tour into subcellular colocalization analysis in light microscopy , 2006, Journal of microscopy.

[21]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[22]  J. Galagan,et al.  Dual modes of natural selection on upstream open reading frames. , 2007, Molecular biology and evolution.

[23]  Lichuan Yang,et al.  Assay of mitochondrial ATP synthesis in animal cells and tissues. , 2007, Methods in cell biology.

[24]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[25]  David G. Knowles,et al.  Recent de novo origin of human protein-coding genes. , 2009, Genome research.

[26]  M. Hatzoglou,et al.  Molecular Symbiosis of CHOP and C/EBPβ Isoform LIP Contributes to Endoplasmic Reticulum Stress-Induced Apoptosis , 2010, Molecular and Cellular Biology.

[27]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[28]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[29]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[30]  Lennart Martens,et al.  SearchGUI: An open‐source graphical user interface for simultaneous OMSSA and X!Tandem searches , 2011, Proteomics.

[31]  F. huAltPrP,et al.  An overlapping reading frame in the PRNP gene encodes a novel polypeptide distinct from the prion protein , 2011 .

[32]  Ann E. Frazier,et al.  MiD 49 and MiD 51 , novel components of the mitochondrial fission machinery , 2011 .

[33]  Marco Y. Hein,et al.  Decoding Human Cytomegalovirus , 2012, Science.

[34]  Josephine A. Reinhardt,et al.  Widespread Polymorphism in the Positions of Stop Codons in Drosophila melanogaster , 2011, Genome biology and evolution.

[35]  César A. Hidalgo,et al.  Proto-genes and de novo gene birth , 2012, Nature.

[36]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[37]  K. Huse,et al.  Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting , 2012, Genome research.

[38]  Andreas Wagner,et al.  Evolution of Viral Proteins Originated De Novo by Overprinting , 2012, Molecular biology and evolution.

[39]  B. Shen,et al.  Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution , 2012, Proceedings of the National Academy of Sciences.

[40]  D. Tautz,et al.  Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution , 2013, BMC Genomics.

[41]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[42]  D. Chan,et al.  Fis1, Mff, MiD49, and MiD51 mediate Drp1 recruitment in mitochondrial fission , 2013, Molecular biology of the cell.

[43]  François-Michel Boisvert,et al.  Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome , 2013, PloS one.

[44]  X. Roucou,et al.  An Out-of-frame Overlapping Reading Frame in the Ataxin-1 Coding Sequence Encodes a Novel Ataxin-1 Interacting Protein* , 2013, The Journal of Biological Chemistry.

[45]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[46]  Y. Chern,et al.  The A2A Adenosine Receptor Is a Dual Coding Gene , 2013, The Journal of Biological Chemistry.

[47]  K. Conzelmann,et al.  Inflammation-induced alteration of astrocyte mitochondrial dynamics requires autophagy for mitochondrial network maintenance. , 2013, Cell metabolism.

[48]  Audrey M. Michel,et al.  GWIPS-viz: development of a ribo-seq genome browser , 2013, Nucleic Acids Res..

[49]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[50]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[51]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[52]  Samuel H. Payne,et al.  Proteogenomic strategies for identification of aberrant cancer peptides using large‐scale next‐generation sequencing data , 2014, Proteomics.

[53]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[54]  M. Moran,et al.  Proteomic Analysis of the Epidermal Growth Factor Receptor (EGFR) Interactome and Post-translational Modifications Associated with Receptor Endocytosis in Response to EGF and Stress* , 2014, Molecular & Cellular Proteomics.

[55]  H. Son,et al.  Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity , 2014, BMC Genomics.

[56]  M. Mann,et al.  Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. , 2014, Cell reports.

[57]  Jiao Ma,et al.  Toddler: An Embryonic Signal That Promotes Cell Movement via Apelin Receptors , 2014, Science.

[58]  G. Kreiman,et al.  Quantitative Profiling of Peptides from RNAs classified as non-coding , 2014, Nature Communications.

[59]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[60]  Daphne Koller,et al.  Causal signals between codon bias, mRNA structure, and the efficiency of translation and elongation , 2014, Molecular systems biology.

[61]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[62]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[63]  Sebastian Gibb,et al.  Visualization of proteomics data using R and Bioconductor , 2015, Proteomics.

[64]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[65]  Aviv Regev,et al.  A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation. , 2015, Molecular cell.

[66]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[67]  Juan Pablo Couso,et al.  Discovery and characterization of smORF-encoded bioactive polypeptides. , 2015, Nature chemical biology.

[68]  A. Regev,et al.  Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins , 2015, eLife.

[69]  Shu-Bing Qian,et al.  Quantitative profiling of initiating ribosomes in vivo , 2014, Nature Methods.

[70]  Patrick B. F. O'Connor,et al.  Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression , 2015, eLife.

[71]  Teemu P. Miettinen,et al.  Modified ribosome profiling reveals high abundance of ribosome protected mRNA fragments derived from 3′ untranslated regions , 2014, Nucleic acids research.

[72]  J. Masel,et al.  The Recent De Novo Origin of Protein C-Termini , 2015, Genome biology and evolution.

[73]  Christian Schlötterer,et al.  Genes from scratch – the evolutionary fate of de novo genes , 2015, Trends in genetics : TIG.

[74]  H. Bellen,et al.  Pri sORF peptides induce selective proteasome-mediated protein processing , 2015, Science.

[75]  X. Roucou,et al.  Found in translation: functions and evolution of a recently discovered alternative proteome. , 2015, Current opinion in structural biology.

[76]  Marco Y. Hein,et al.  A Human Interactome in Three Quantitative Dimensions Organized by Stoichiometries and Abundances , 2015, Cell.

[77]  H. Angerer Eukaryotic LYR Proteins Interact with Mitochondrial Protein Complexes , 2015, Biology.

[78]  John M. Shelton,et al.  A Micropeptide Encoded by a Putative Long Noncoding RNA Regulates Muscle Performance , 2015, Cell.

[79]  Z. Yakhini,et al.  Systematic discovery of cap-independent translation sequences in human and viral genomes , 2016, Science.

[80]  W. Samson,et al.  A 5′‐upstream short open reading frame encoded peptide regulates angiotensin type 1a receptor production and signalling via the β‐arrestin pathway , 2016, The Journal of physiology.

[81]  Antonio J Giraldez,et al.  Upstream ORFs are prevalent translational repressors in vertebrates , 2016, The EMBO journal.

[82]  M. Sachs,et al.  Ribosome Elongation Stall Directs Gene-specific Translation in the Integrated Stress Response* , 2016, The Journal of Biological Chemistry.

[83]  Yang I Li,et al.  Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling , 2015, bioRxiv.

[84]  Benoît Vanderperre,et al.  MPC1-like Is a Placental Mammal-specific Mitochondrial Pyruvate Carrier Subunit Expressed in Postmeiotic Male Germ Cells* , 2016, The Journal of Biological Chemistry.

[85]  L. Hurst,et al.  Open questions in the study of de novo genes: what, how and why , 2016, Nature Reviews Genetics.

[86]  Andrew Emili,et al.  Multiparameter functional diversity of human C2H2 zinc finger proteins , 2016, Genome research.

[87]  Stephen C. Cannon,et al.  A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle , 2016, Science.

[88]  Gretchen A. Stevens,et al.  A century of trends in adult human height , 2016, eLife.

[89]  V. Delcourt,et al.  Death of a dogma: eukaryotic mRNAs can code for more than one protein , 2015, Nucleic acids research.

[90]  D. Fenyö,et al.  Proteogenomics from a bioinformatics angle: A growing field. , 2015, Mass spectrometry reviews.

[91]  J. Couso,et al.  Classification and function of small open reading frames , 2017, Nature Reviews Molecular Cell Biology.