PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Abstract Background Long thought “relics” of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene–parent gene relationships without leveraging other homologous genes/pseudogenes. Results We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four “flavors” of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a “one stop shop” for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. Conclusions Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.

[1]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[2]  V. Barbieri,et al.  Interferon-alpha counteracts the angiogenic switch and reduces tumor cell proliferation in a spontaneous model of prostatic cancer. , 2009, Carcinogenesis.

[3]  Rui Chen,et al.  Pseudogene OCT4-pg4 functions as a natural micro RNA sponge to regulate OCT4 expression by competing for miR-145 in hepatocellular carcinoma. , 2013, Carcinogenesis.

[4]  Yan Zhang,et al.  Landscape and variation of novel retroduplications in 26 human populations , 2017, PLoS Comput. Biol..

[5]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[6]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[7]  R. Fisher Tests of significance in harmonic analysis , 1929 .

[8]  G. Daley,et al.  Deciphering the rules of ceRNA networks , 2013, Proceedings of the National Academy of Sciences.

[9]  Dong Wang,et al.  Competing endogenous RNA networks in human cancer: hypothesis, validation, and perspectives , 2016, Oncotarget.

[10]  Shun Liu,et al.  dreamBase: DNA modification, RNA regulation and protein binding of expressed pseudogenes in human health and disease , 2017, Nucleic Acids Res..

[11]  J. Ramalho-Carvalho,et al.  MicroRNA-375 plays a dual role in prostate carcinogenesis , 2015, Clinical Epigenetics.

[12]  Xia Li,et al.  Extensive ceRNA–ceRNA interaction networks mediated by miRNAs regulate development in multiple rhesus tissues , 2016, Nucleic acids research.

[13]  Z. Wang,et al.  MicroRNA-103 Promotes Colorectal Cancer by Targeting Tumor Suppressor DICER and PTEN , 2014, International journal of molecular sciences.

[14]  Long-Bang Chen,et al.  MicroRNA-145: a potent tumour suppressor that regulates multiple cellular pathways , 2014, Journal of cellular and molecular medicine.

[15]  H. Dai,et al.  LncRNA-GAS5 induces PTEN expression through inhibiting miR-103 in endometrial cancer cells , 2015, Journal of Biomedical Science.

[16]  Yang An,et al.  Pseudogenes regulate parental gene expression via ceRNA network , 2016, Journal of cellular and molecular medicine.

[17]  M. Knez,et al.  Ferritin light-chain subunits: key elements for the electron transfer across the protein cage. , 2014, Chemical communications.

[18]  R. Verhaak,et al.  The Pan-Cancer Analysis of Pseudogene Expression Reveals Biologically and Clinically Relevant Tumour Subtypes , 2014, Nature Communications.

[19]  D. Haussler,et al.  Retrocopy contributions to the evolution of the human genome , 2008, BMC Genomics.

[20]  Jeffrey P. MacKeigan,et al.  Sensitized RNAi screen of human kinases and phosphatases identifies new regulators of apoptosis and chemoresistance , 2005, Nature Cell Biology.

[21]  M. Gerstein,et al.  A computational approach for identifying pseudogenes in the ENCODE regions , 2006, Genome Biology.

[22]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[23]  P. Pandolfi,et al.  Pseudogenes in Human Cancer , 2015, Front. Med..

[24]  Latarsha J. Carithers,et al.  The Genotype-Tissue Expression (GTEx) Project. , 2015, Biopreservation and biobanking.

[25]  A. Ciccodicola,et al.  Non-coding RNA and pseudogenes in neurodegenerative diseases: “The (un)Usual Suspects” , 2012, Front. Gene..

[26]  Kun Huang,et al.  Network analysis of pseudogene-gene relationships: from pseudogene evolution to their functional potentials , 2018, PSB.

[27]  Lara E Sucheston-Campbell,et al.  The miR-96 and RARγ signaling axis governs androgen signaling and prostate cancer progression , 2018, Oncogene.

[28]  M. Nevalainen,et al.  miR-375 induces docetaxel resistance in prostate cancer by targeting SEC23A and YAP1 , 2016, Molecular Cancer.

[29]  Cheng Wu,et al.  Identification of potential cancer-related pseudogenes in lung adenocarcinoma based on ceRNA hypothesis , 2017, Oncotarget.

[30]  Kei-Hoi Cheung,et al.  Pseudofam: the pseudogene families database , 2008, Nucleic Acids Res..

[31]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[32]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[33]  Q. Xue,et al.  Long non-coding RNA PTENP1 functions as a ceRNA to modulate PTEN level by decoying miR-106b and miR-93 in gastric cancer , 2017, Oncotarget.

[34]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[35]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[36]  P. Pandolfi,et al.  PTEN ceRNA networks in human cancer. , 2015, Methods.

[37]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  Z. Dong,et al.  Knockdown of long non-coding RNA TP73-AS1 inhibits cell proliferation and induces apoptosis in esophageal squamous cell carcinoma , 2016, Oncotarget.

[40]  A. Mighell,et al.  Vertebrate pseudogenes , 2000, FEBS letters.

[41]  Andrew Menzies,et al.  Processed pseudogenes acquired somatically during cancer development , 2014, Nature Communications.

[42]  Xuerui Yang,et al.  High-throughput validation of ceRNA regulatory networks , 2017, BMC Genomics.

[43]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[44]  Sean R. Eddy,et al.  The Pfam protein families database , 2007, Nucleic Acids Res..

[45]  C. Cobbs,et al.  Cytomegalovirus Immediate-Early Proteins Promote Stemness Properties in Glioblastoma. , 2015, Cancer research.

[46]  Junming Guo,et al.  Up-regulation of SUMO1 pseudogene 3 (SUMO1P3) in gastric cancer and its clinical association , 2013, Medical Oncology.

[47]  Y. Miao,et al.  MiR-106b and miR-93 regulate cell progression by suppression of PTEN via PI3K/Akt pathway in breast cancer , 2017, Cell Death & Disease.

[48]  Yan Zhang,et al.  Comparative analysis of pseudogenes across three phyla , 2014, Proceedings of the National Academy of Sciences.

[49]  Adrian Alexa,et al.  Gene set enrichment analysis with topGO , 2006 .

[50]  David E. Hudak,et al.  Open OnDemand: A web-based client portal for HPC centers , 2018, J. Open Source Softw..

[51]  S. Dhanasekaran,et al.  Expressed Pseudogenes in the Transcriptional Landscape of Human Cancers , 2012, Cell.

[52]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[53]  E. Vanin,et al.  Processed pseudogenes: characteristics and evolution. , 1984, Annual review of genetics.

[54]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[55]  Piero Carninci,et al.  Edinburgh Research Explorer Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma , 2022 .

[56]  D. Bartel,et al.  Predicting effective microRNA target sites in mammalian mRNAs , 2015, eLife.

[57]  Joshua D. Welch,et al.  Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential , 2015, BMC Genomics.

[58]  Baohui Liu,et al.  Expression of Ferritin Light Chain (FTL) Is Elevated in Glioblastoma, and FTL Silencing Inhibits Glioblastoma Cell Proliferation via the GADD45/JNK Pathway , 2016, PloS one.

[59]  P. Pandolfi,et al.  A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010, Nature.

[60]  M. Gerstein,et al.  The GENCODE pseudogene resource , 2012, Genome Biology.

[61]  E. Punch,et al.  Pseudogenes: pseudo-functional or key regulators in health and disease? , 2011, RNA.

[62]  Mark Gerstein,et al.  PseudoPipe: an automated pseudogene identification pipeline , 2006, Bioinform..

[63]  L. Lipovich,et al.  Pseudogene-derived lncRNAs: emerging regulators of gene expression , 2015, Front. Genet..

[64]  Mark Gerstein,et al.  Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. , 2002, Nucleic acids research.

[65]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[66]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[67]  Yvonne Tay,et al.  Noncoding RNA:RNA Regulatory Networks in Cancer , 2018, International journal of molecular sciences.

[68]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[69]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[70]  Yvonne Tay,et al.  A FTH1 gene:pseudogene:microRNA network regulates tumorigenesis in prostate cancer , 2017, Nucleic acids research.

[71]  Chirag Jain,et al.  Fine-grained GPU parallelization of pairwise local sequence alignment , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[72]  Yvonne Tay,et al.  A Pattern-Based Method for the Identification of MicroRNA Binding Sites and Their Corresponding Heteroduplexes , 2006, Cell.