Proteogenomics analysis of non-coding region encoded peptides in normal tissues and five cancer types

Previous proteogenomics studies have identified peptides encoded by non-coding sequences such as pseudogenes and long non-coding RNAs (lncRNAs) in healthy human tissues as well as in cancers. However, these studies are either limited to analyze only healthy or cancerous tissues, lacking direct comparison between them. In this study, we used an established proteogenomics analysis workflow to analyze proteomics data from 926 cancer samples of five cancer types and 31 different healthy human tissues. We observed the protein level expression of pseudogenes can be classified as ubiquitous or lineage expression. The ubiquitously translated pseudogenes are homologous to house-keeping genes. Our results suggest there is common mechanism underlying the translation of pseudogenes in both normal and tumors. Moreover, we discovered several translated non-coding genes such as DGCR5 and RHOXF1P3 that were up-regulated in tumors compared to normal. These translated pseudogenes imply the biological function of pseudogenes extends to protein level yet to be studied. Further, from the non-coding region encoded peptides specifically detected in tumors we have predicted a large number of potential neoantigens which can be developed as cancer vaccine.

[1]  Martin S. Taylor,et al.  LINE-1 ORF2p expression is nearly imperceptible in human cancers , 2019, Mobile DNA.

[2]  Zemin Zhang,et al.  GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis , 2019, Nucleic Acids Res..

[3]  J. Boeke,et al.  LINE-1 derepression in senescent cells triggers interferon and inflammaging , 2018, Nature.

[4]  P. Gendron,et al.  Noncoding regions are the main source of targetable tumor-specific antigens , 2018, Science Translational Medicine.

[5]  Mathias Wilhelm,et al.  A deep proteome and transcriptome abundance atlas of 29 healthy human tissues , 2018, bioRxiv.

[6]  Yafeng Zhu,et al.  Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow , 2018, Nature Communications.

[7]  W. Huber,et al.  Proteome-wide identification of ubiquitin interactions using UbIA-MS , 2018, Nature Protocols.

[8]  Xianjin Du,et al.  LINC00037 Inhibits Proliferation of Renal Cell Carcinoma Cells in an Epidermal Growth Factor Receptor-Dependent Way , 2018, Cellular Physiology and Biochemistry.

[9]  Shun Liu,et al.  dreamBase: DNA modification, RNA regulation and protein binding of expressed pseudogenes in human health and disease , 2017, Nucleic Acids Res..

[10]  Lennart Martens,et al.  moFF: a robust and automated approach to extract peptide ion intensities , 2016, Nature Methods.

[11]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[12]  Yang I Li,et al.  Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling , 2015, bioRxiv.

[13]  M. Mann,et al.  Absolute Proteome Analysis of Colorectal Mucosa, Adenoma, and Cancer Reveals Drastic Changes in Fatty Acid Metabolism and Plasma Membrane Transporters. , 2015, Journal of proteome research.

[14]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[15]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[16]  R. Verhaak,et al.  The Pan-Cancer Analysis of Pseudogene Expression Reveals Biologically and Clinically Relevant Tumour Subtypes , 2014, Nature Communications.

[17]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[18]  Martin S. Taylor,et al.  Long interspersed element-1 protein expression is a hallmark of many human cancers. , 2014, The American journal of pathology.

[19]  M. Huss,et al.  HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics , 2013, Nature Methods.

[20]  Martin S. Taylor,et al.  Affinity Proteomics Reveals Human Host Factors Implicated in Discrete Stages of LINE-1 Retrotransposition , 2013, Cell.

[21]  Gerben Menschaert,et al.  Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs , 2013, BMC Genomics.

[22]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[23]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[24]  S. Dhanasekaran,et al.  Expressed Pseudogenes in the Transcriptional Landscape of Human Cancers , 2012, Cell.

[25]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[26]  Nicholas T. Ingolia,et al.  Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes , 2011, Cell.

[27]  A. Iafrate,et al.  Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers , 2011, Science.

[28]  P. Pandolfi,et al.  A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010, Nature.

[29]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[30]  Monilola A. Olayioye,et al.  The Phosphoprotein StarD10 Is Overexpressed in Breast Cancer and Cooperates with ErbB Receptors in Cellular Transformation , 2004, Cancer Research.