Repurposing the dark genome. II - Reverse Proteins

Based on the expression blueprint encoded in the genome, three groups of sequences have been identified – protein encoding, RNA encoding, and non-expressing. We asked: Why did nature choose a particular DNA sequence for expression? Did she sample every possibility, approving some for RNA synthesis, some for protein synthesis, and retiring/ignoring the rest. If evolution randomly selected sequences for metabolic trials, how much non-utilized (not-expressing) and under-utilized (only RNA encoding) information is currently available for innovations? These questions lead us to experimentally synthesizing functional proteins from intergenic sequences of E.coli (Dhar et al 2009). The current work is an extension of this original report and takes into consideration natural protein-coding sequences ‘read backward’ to generate a new possibility. Reverse proteins are full-length ‘translation equivalents’ of the existing protein-coding genes read in the -1 frame. The structural, functional and interaction predictions of reverse proteins in E.coli, S.cerevisiae and D.melanogaster, open up a new opportunity of producing ‘first-in-the-class’ proteins towards functional endpoints. This study points to a large untapped genomic space from the fundamental biology and applications perspectives.

[1]  P. Dhar,et al.  tREPs—A New Class of Functional tRNA-Encoded Peptides , 2022, ACS omega.

[2]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[3]  N. Raj,et al.  Function annotation of peptides generated from the non-coding regions of D. melanogaster genome , 2016, Bioinformation.

[4]  Seema Sehrawat,et al.  In silico study of peptide inhibitors against BACE 1 , 2015, Systems and Synthetic Biology.

[5]  Shailja Singh,et al.  Making novel proteins from pseudogenes , 2015, Bioinform..

[6]  Olga Golosova,et al.  Unipro UGENE: a unified bioinformatics toolkit , 2012, Bioinform..

[7]  Ashok Sharma,et al.  Structure prediction and functional characterization of secondary metabolite proteins of Ocimum , 2011, Bioinformation.

[8]  P. Karplus,et al.  A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins , 2010, Biomolecular concepts.

[9]  F. Eisenhaber,et al.  Synthesizing non-natural parts from natural genomic template , 2009, Journal of biological engineering.

[10]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[11]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[12]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[13]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[14]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[15]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  S. Ohno,et al.  So much "junk" DNA in our genome. , 1972, Brookhaven symposia in biology.