Classification and function of small open reading frames

Small open reading frames (smORFs) of 100 codons or fewer are usually — if arbitrarily — excluded from proteome annotations. Despite this, the genomes of many metazoans, including humans, contain millions of smORFs, some of which fulfil key physiological functions. Recently, the transcriptome of Drosophila melanogaster was shown to contain thousands of smORFs of different classes that actively undergo translation, which produces peptides of mostly unknown function. Here, we present a comprehensive analysis of smORFs in flies, mice and humans. We propose the existence of several functional classes of smORFs, ranging from inert DNA sequences to transcribed and translated cis-regulators of translation and peptides with a propensity to function as regulators of membrane-associated proteins, or as components of ancient protein complexes in the cytoplasm. We suggest that the different smORF classes could represent steps in gene, peptide and protein evolution. Our analysis introduces a distinction between different peptide-coding classes of smORFs in animal genomes, and highlights the role of model organisms for the study of small peptide biology in the context of development, physiology and human disease.

[1]  Jiao Ma,et al.  Discovery of Human sORF-Encoded Polypeptides (SEPs) in Cell Lines and Tissue , 2014, Journal of proteome research.

[2]  M. Hällbrink,et al.  Peptide degradation is a critical determinant for cell-penetrating peptide uptake. , 2007, Biochimica et biophysica acta.

[3]  Joshua G. Dunn,et al.  Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster , 2013, eLife.

[4]  G. Rubin,et al.  The Role of the Genome Project in Determining Gene Function: Insights from Model Organisms , 1996, Cell.

[5]  John Calvin Reed,et al.  Humanin peptide suppresses apoptosis by interfering with Bax activation , 2003, Nature.

[6]  Xiao-Hong Sun,et al.  Id proteins: small molecules, mighty regulators. , 2014, Current topics in developmental biology.

[7]  Zhi Xie,et al.  Global and cell-type specific properties of lincRNAs with ribosome occupancy , 2016, Nucleic acids research.

[8]  M. Sternberg,et al.  Partial protein domains: evolutionary insights and bioinformatics challenges , 2015, Genome Biology.

[9]  F C Kafatos,et al.  Phylogenetic perspectives in innate immunity. , 1999, Science.

[10]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[11]  Alan Saghatelian,et al.  A Human Short Open Reading Frame (sORF)-encoded Polypeptide That Stimulates DNA End Joining* , 2014, The Journal of Biological Chemistry.

[12]  J. Couso,et al.  Ribosomal profiling adds new coding sequences to the proteome. , 2015, Biochemical Society transactions.

[13]  César A. Hidalgo,et al.  Proto-genes and de novo gene birth , 2012, Nature.

[14]  S. Wenkel,et al.  Regulation of protein function by ‘microProteins’ , 2011, EMBO reports.

[15]  Xue-Qing Wang,et al.  5'-untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. , 2004, Nucleic acids research.

[16]  N. Perrimon,et al.  Quantitative variations in the level of MAPK activity control patterning of the embryonic termini in Drosophila. , 1999, Developmental biology.

[17]  Juan Pablo Couso,et al.  Peptides Encoded by Short ORFs Control Development and Define a New Eukaryotic Gene Family , 2007, PLoS biology.

[18]  Frances M. G. Pearl,et al.  Conserved Regulation of Cardiac Calcium Uptake by Peptides Encoded in Small Open Reading Frames , 2013, Science.

[19]  J. Rinn,et al.  Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs , 2013, Development.

[20]  Chung-Mo Park,et al.  Competitive inhibition of transcription factors by small interfering peptides. , 2011, Trends in plant science.

[21]  M. Zasloff Antimicrobial peptides of multicellular organisms , 2002, Nature.

[22]  Jing Tian,et al.  ELABELA: a hormone essential for heart development signals via the apelin receptor. , 2013, Developmental cell.

[23]  Li Zhao,et al.  Origin and Spread of de Novo Genes in Drosophila melanogaster Populations , 2014, Science.

[24]  James B. Brown,et al.  Long noncoding RNAs are rarely translated in two human cell lines , 2012, Genome research.

[25]  Sebastian D. Mackowiak,et al.  Extensive identification and analysis of conserved small ORFs in animals , 2015, Genome Biology.

[26]  M. Billingsley,et al.  Functional and structural properties of stannin: Roles in cellular growth, selective toxicity, and mitochondrial responses to injury , 2006, Journal of cellular biochemistry.

[27]  S. Pimplikar,et al.  Reassessing the amyloid cascade hypothesis of Alzheimer's disease. , 2009, The international journal of biochemistry & cell biology.

[28]  A. Eyre-Walker,et al.  Hundreds of putatively functional small open reading frames in Drosophila , 2011, Genome Biology.

[29]  Nicholas T Ingolia,et al.  Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. , 2014, Cell reports.

[30]  Ű. Langel,et al.  Predicting cell-penetrating peptides. , 2008, Advanced drug delivery reviews.

[31]  Christian Schlötterer,et al.  Genes from scratch – the evolutionary fate of de novo genes , 2015, Trends in genetics : TIG.

[32]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[33]  J. Rinn,et al.  Modular regulatory principles of large non-coding RNAs , 2012, Nature.

[34]  M. Gerstein,et al.  What is a gene, post-ENCODE? History and updated definition. , 2007, Genome research.

[35]  D. Bartel,et al.  Widespread changes in the posttranscriptional landscape at the Drosophila oocyte-to-embryo transition. , 2014, Cell reports.

[36]  A. Hinnebusch,et al.  eIF3a cooperates with sequences 5' of uORF1 to promote resumption of scanning by post-termination ribosomes for reinitiation on GCN4 mRNA. , 2008, Genes & development.

[37]  Changhan Lee,et al.  The mitochondrial-derived peptide MOTS-c promotes metabolic homeostasis and reduces obesity and insulin resistance. , 2015, Cell metabolism.

[38]  Q. Zeng,et al.  Systematic discovery of new genes in the Saccharomyces cerevisiae genome. , 2003, Genome research.

[39]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[40]  Emile G Magny,et al.  New Peptides Under the s(ORF)ace of the Genome. , 2016, Trends in biochemical sciences.

[41]  Paul Lasko,et al.  Drosophila Pgc protein inhibits P-TEFb recruitment to chromatin in primordial germ cells , 2008, Nature.

[42]  P. Burbelo,et al.  Biochemical characterization of distinct regions of SPEC molecules and their role in phagocytosis. , 2007, Experimental cell research.

[43]  M. Burke,et al.  BRICK1/HSPC300 functions with SCAR and the ARP2/3 complex to regulate epidermal cell shape in Arabidopsis , 2006, Development.

[44]  Satoshi Naito,et al.  Identification of novel Arabidopsis thaliana upstream open reading frames that control expression of the main coding sequences in a peptide sequence-dependent manner , 2015, Nucleic acids research.

[45]  K. Brogden Antimicrobial peptides: pore formers or metabolic inhibitors in bacteria? , 2005, Nature Reviews Microbiology.

[46]  Stephen C. Cannon,et al.  A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle , 2016, Science.

[47]  György Abrusán Integration of New Genes into Cellular Networks, and Their Structural Maturation , 2013, Genetics.

[48]  Ying Chen Eyre-Walker,et al.  Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq , 2014, eLife.

[49]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[50]  J. Mata,et al.  The translational landscape of fission yeast meiosis and sporulation , 2014, Nature Structural &Molecular Biology.

[51]  A. Fatica,et al.  Long non-coding RNAs: new players in cell differentiation and development , 2013, Nature Reviews Genetics.

[52]  Nicholas T. Ingolia,et al.  Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins , 2013, Cell.

[53]  B. Lemaître,et al.  Drosophila host defense: differential induction of antimicrobial peptide genes after infection by various classes of microorganisms. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[54]  J. Wade Harper,et al.  Ubiquitin-like protein activation by E1 enzymes: the apex for downstream signalling pathways , 2009, Nature Reviews Molecular Cell Biology.

[55]  J. Couso,et al.  Tarsal-less peptides control Notch signalling through the Shavenbaby transcription factor. , 2011, Developmental biology.

[56]  S. Lawo,et al.  The Drosophila mitotic inhibitor Frühstart specifically binds to the hydrophobic patch of cyclins , 2007, EMBO reports.

[57]  A. Prochiantz,et al.  Transduction peptides: from technology to physiology , 2004, Nature Cell Biology.

[58]  Moritz Graeff,et al.  MicroProtein-Mediated Recruitment of CONSTANS into a TOPLESS Trimeric Complex Represses Flowering in Arabidopsis , 2016, PLoS genetics.

[59]  John M. Shelton,et al.  A Micropeptide Encoded by a Putative Long Noncoding RNA Regulates Muscle Performance , 2015, Cell.

[60]  S. Korsmeyer,et al.  Cell Death in Development , 1999, Cell.

[61]  P. Cohen,et al.  Rat Humanin is encoded and translated in mitochondria and is localized to the mitochondrial compartment where it regulates ROS production , 2015, Molecular and Cellular Endocrinology.

[62]  V. Mootha,et al.  Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans , 2009, Proceedings of the National Academy of Sciences.

[63]  Leonard Lipovich,et al.  Global Intersection of Long Non-Coding RNAs with Processed and Unprocessed Pseudogenes in the Human Genome , 2016, Front. Genet..

[64]  A. Regev,et al.  Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins , 2015, eLife.

[65]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[66]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[67]  M. Kozak,et al.  Regulation of translation via mRNA structure in prokaryotes and eukaryotes. , 2005, Gene.

[68]  Jiao Ma,et al.  Toddler: An Embryonic Signal That Promotes Cell Movement via Apelin Receptors , 2014, Science.

[69]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[70]  L. Schoofs,et al.  Peptidomics of the Larval Drosophila melanogasterCentral Nervous System* , 2002, The Journal of Biological Chemistry.

[71]  Susana Rivas,et al.  Trans-regulation of the expression of the transcription factor MtHAP2-1 by a uORF controls root nodule development. , 2008, Genes & development.

[72]  Jianzhi Zhang,et al.  Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution. , 2016, Molecular biology and evolution.

[73]  Emile G Magny,et al.  Hemotin, a Regulator of Phagocytosis Encoded by a Small ORF and Conserved across Metazoans , 2016, PLoS biology.

[74]  K. Shinozaki,et al.  Small open reading frames associated with morphogenesis are hidden in plant genomes , 2013, Proceedings of the National Academy of Sciences.

[75]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[76]  Christophe Dunand,et al.  Primary transcripts of microRNAs encode regulatory peptides , 2015, Nature.

[77]  T. D. Schneider,et al.  Small membrane proteins found by comparative genomics and ribosome binding site models , 2008, Molecular microbiology.

[78]  Toni Gabaldón,et al.  Secondary structure impacts patterns of selection in human lncRNAs , 2016, BMC Biology.

[79]  Philip E. Bourne,et al.  The Evolutionary History of Protein Domains Viewed by Species Phylogeny , 2009, PloS one.

[80]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[81]  Gerben Menschaert,et al.  Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs , 2013, BMC Genomics.

[82]  J. Couso,et al.  The 11-aminoacid long Tarsal-less peptides trigger a cell signal in Drosophila leg development. , 2008, Developmental biology.

[83]  Y. Au,et al.  The muscle ultrastructure: a structural perspective of the sarcomere , 2004, Cellular and Molecular Life Sciences CMLS.

[84]  Jun Kawai,et al.  The Abundance of Short Proteins in the Mammalian Proteome , 2006, PLoS genetics.

[85]  Juan Pablo Couso,et al.  Discovery and characterization of smORF-encoded bioactive polypeptides. , 2015, Nature chemical biology.

[86]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[87]  O. Jaillon,et al.  Translational control of intron splicing in eukaryotes , 2008, Nature.

[88]  M. Long,et al.  Detection of intergenic non-coding RNAs expressed in the main developmental stages in Drosophila melanogaster , 2009, Nucleic acids research.

[89]  Josephine A. Reinhardt,et al.  De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences , 2013, PLoS genetics.

[90]  J. Alonso,et al.  Characterization of the Drosophila melanogaster ribosomal proteome. , 2006, Journal of proteome research.

[91]  E. Cuppen,et al.  Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes , 2014, Genome Biology.

[92]  G. Weinmaster,et al.  Notch ligand ubiquitylation: what is it good for? , 2011, Developmental cell.

[93]  Simon W. Jones,et al.  Characterisation of cell‐penetrating peptide‐mediated peptide delivery , 2005, British journal of pharmacology.

[94]  Ying Li,et al.  Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs , 2012, PLoS genetics.

[95]  K. Huse,et al.  Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting , 2012, Genome research.

[96]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[97]  M. Albà,et al.  Long non-coding RNAs as a source of new peptides , 2014, eLife.

[98]  Wenqian Hu,et al.  Translation of small open reading frames within unannotated RNA transcripts in Saccharomyces cerevisiae. , 2014, Cell reports.

[99]  M. Murphy,et al.  Targeting lipophilic cations to mitochondria. , 2008, Biochimica et biophysica acta.

[100]  R. Sommer The future of evo–devo: model systems and evolutionary theory , 2009, Nature Reviews Genetics.

[101]  G. Stephanopoulos,et al.  Optimization of Protein Fusion Partner Length for Maximizing in Vitro Translation of Peptides , 2007, Biotechnology progress.

[102]  F. Dietrich,et al.  Identification and characterization of upstream open reading frames (uORF) in the 5′ untranslated regions (UTR) of genes in Saccharomyces cerevisiae , 2005, Current Genetics.

[103]  M. Iijima,et al.  Mitochondrial dynamics in neurodegeneration. , 2013, Trends in cell biology.

[104]  H. Bellen,et al.  Pri sORF peptides induce selective proteasome-mediated protein processing , 2015, Science.

[105]  Giorgio Bernardi,et al.  Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins , 1991, Journal of Molecular Evolution.

[106]  D. Tautz,et al.  Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution , 2013, BMC Genomics.

[107]  Jef D Boeke,et al.  Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. , 2006, Genome research.

[108]  H. Sahl,et al.  Small cationic antimicrobial peptides delocalize peripheral membrane proteins , 2014, Proceedings of the National Academy of Sciences.

[109]  D. Waxman,et al.  Pleiotropy and the preservation of perfection. , 1998, Science.

[110]  V. Weissig,et al.  Mitochondrial Medicine , 2015, Methods in Molecular Biology.

[111]  Hanmei Xu,et al.  DRAMP: a comprehensive data repository of antimicrobial peptides , 2016, Scientific Reports.

[112]  A. McLysaght,et al.  New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[113]  Antonio J Giraldez,et al.  Upstream ORFs are prevalent translational repressors in vertebrates , 2016, The EMBO journal.