Modeling circRNA expression pattern with integrated sequence and epigenetic features demonstrates the potential involvement of H3K79me2 in circRNA expression

MOTIVATION CircRNAs are an abundant class of noncoding RNAs with widespread, cell/tissue specific patterns. Previous work suggested that epigenetic features might be related to circRNA expression. However, the contribution of epigenetic changes to circRNA expression has not been investigated systematically. Here we built a machine learning framework named CIRCScan, to predict circRNA expression in various cell lines based on the sequence and epigenetic features. RESULTS The predicted accuracy of the expression status models was high with area under the curve of ROC (AUC) values of 0.89∼0.92 and the false positive rates (FPR) of 0.17∼0.25. Predicted expressed circRNAs were further validated by RNA-seq data. The performance of expression level prediction models was also good with normalized root-mean-square errors (RMSE) of 0.28∼0.30 and Pearson's correlation coefficient r (PCC) over 0.4 in all cell lines, along with Spearman's correlation coefficient ρ of 0.33∼0.46. Noteworthy, H3K79me2 was highly ranked in modeling both circRNA expression status and levels across different cells. Further analysis in additional 9 cell lines demonstrated a significant enrichment of H3K79me2 in circRNA flanking intron regions, supporting the potential involvement of H3K79me2 in circRNA expression regulation. AVAILABILITY The CIRCScan assembler is freely available online for academic use at https://github.com/johnlcd/CIRCScan. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[2]  J. Kjems,et al.  Natural RNA circles function as efficient microRNA sponges , 2013, Nature.

[3]  Jordanka Zlatanova,et al.  H2A.Z: view from the top. , 2008, Structure.

[4]  S. Dhanasekaran,et al.  The Landscape of Circular RNA in Cancer , 2019, Cell.

[5]  Eduardo Eyras,et al.  A chromatin code for alternative splicing involving a putative association between CTCF and HP1α proteins , 2015, BMC Biology.

[6]  J. Issa,et al.  Enrichment for Histone H3 Lysine 9 Methylation at Alu Repeats in Human Cells* , 2003, Journal of Biological Chemistry.

[7]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[8]  F. Zhao,et al.  CIRI: an efficient and unbiased algorithm for de novo circular RNA identification , 2015, Genome Biology.

[9]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[10]  Ling-Ling Chen,et al.  Complementary Sequence-Mediated Exon Circularization , 2014, Cell.

[11]  Xiang Li,et al.  The Biogenesis, Functions, and Challenges of Circular RNAs. , 2018, Molecular cell.

[12]  Wei Chen,et al.  Circular RNAs in Brain and Other Tissues: A Functional Enigma , 2016, Trends in Neurosciences.

[13]  Nevan J Krogan,et al.  The histone variant H2A.Z promotes splicing of weak introns , 2017, Genes & development.

[14]  Petar Glažar,et al.  Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. , 2015, Molecular cell.

[15]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Alain Bergeron,et al.  Widespread and Functional RNA Circularization in Localized Prostate Cancer , 2019, Cell.

[17]  Haimin Li,et al.  Circular RNA: A new star of noncoding RNAs. , 2015, Cancer letters.

[18]  E. Westhof,et al.  Biogenesis of Circular RNAs , 2014, Cell.

[19]  Julia Salzman,et al.  Circular RNA biogenesis can proceed through an exon-containing lariat precursor , 2015, eLife.

[20]  Victor X. Jin,et al.  Integrative analysis reveals functional and regulatory roles of H3K79me2 in mediating alternative splicing , 2018, Genome Medicine.

[21]  Michael K. Slevin,et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. , 2013, RNA.

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[24]  Tim Hui-Ming Huang,et al.  Chromatin immunoprecipitation microarrays for identification of genes silenced by histone H3 lysine 9 methylation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[26]  Jing-Dong J Han,et al.  Evolution of Alu elements toward enhancers. , 2014, Cell reports.

[27]  Shane J. Neph,et al.  DNase I–hypersensitive exons colocalize with promoters and distal regulatory elements , 2013, Nature Genetics.

[28]  Silvia Jimeno-González,et al.  Defective histone supply causes changes in RNA polymerase II elongation rate and cotranscriptional pre-mRNA splicing , 2015, Proceedings of the National Academy of Sciences.

[29]  J. Kjems,et al.  Circular RNA and miR-7 in cancer. , 2013, Cancer research.

[30]  James B. Brown,et al.  Modeling gene expression using chromatin features in various cellular contexts , 2012, Genome Biology.

[31]  B. Blencowe,et al.  Regulation of Alternative Splicing by Histone Modifications , 2010, Science.

[32]  E. Eyras,et al.  The prognostic potential of alternative transcript isoforms across human tumors , 2016, Genome Medicine.

[33]  Julia Salzman,et al.  Cell-Type Specific Features of Circular RNA Expression , 2013, PLoS genetics.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Wei Lin,et al.  A comprehensive overview and evaluation of circular RNA detection tools , 2017, PLoS Comput. Biol..

[36]  Julia Salzman,et al.  Circular RNAs: analysis, expression and potential functions , 2016, Development.

[37]  Gil Ast,et al.  Regulation of alternative splicing through coupling with transcription and chromatin structure. , 2015, Annual review of biochemistry.

[38]  Ruiling Liu,et al.  Computational identification of circular RNAs based on conformational and thermodynamic properties in the flanking introns , 2016, Comput. Biol. Chem..

[39]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[40]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[41]  Subhajyoti De,et al.  Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells , 2014, PLoS Comput. Biol..

[42]  J. Kjems,et al.  Comparison of circular RNA prediction tools , 2015, Nucleic acids research.

[43]  Tim Schneider,et al.  Exon circularization requires canonical splice signals. , 2015, Cell reports.

[44]  Jan Komorowski,et al.  Nucleosomes are well positioned in exons and carry characteristic histone modifications. , 2009, Genome research.

[45]  Xiaoyong Pan,et al.  PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. , 2015, Molecular bioSystems.

[46]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[47]  Christoph Dieterich,et al.  Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. , 2015, Cell reports.

[48]  Hyunmin Kim,et al.  Pre-mRNA splicing is facilitated by an optimal RNA polymerase II elongation rate , 2014, Genes & development.

[49]  Jun Zhang,et al.  Diverse alternative back-splicing and alternative splicing landscape of circular RNAs , 2016, Genome research.

[50]  Eric S. Lander,et al.  Chromatin modifying enzymes as modulators of reprogramming , 2012, Nature.

[51]  Linda Szabo,et al.  Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development , 2015, Genome Biology.