IRESpy: an XGBoost model for prediction of internal ribosome entry sites

BackgroundInternal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool.ResultsThis paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as QMFE, and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments.ConclusionsIRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.

[1]  Norihiro Shibuya,et al.  Structural variant of the intergenic internal ribosome entry site elements in dicistroviruses and computational search for their counterparts. , 2004, RNA.

[2]  Pandurang Kolekar,et al.  IRESPred: Web Server for Prediction of Cellular and Viral Internal Ribosome Entry Site (IRES) , 2016, Scientific Reports.

[3]  Giorgio Valentini,et al.  Computational intelligence and machine learning in bioinformatics , 2009, Artif. Intell. Medicine.

[4]  E. Jan,et al.  Modular domains of the Dicistroviridae intergenic internal ribosome entry site. , 2010, RNA.

[5]  E. Martínez-Salas,et al.  Modeling Three-Dimensional Structural Motifs of Viral IRES. , 2016, Journal of molecular biology.

[6]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[7]  Martin Mokrejs,et al.  IRESite—a tool for the examination of viral and cellular internal ribosome entry sites , 2009, Nucleic Acids Res..

[8]  E. Martínez-Salas,et al.  Structural organization of a viral IRES depends on the integrity of the GNRA motif. , 2003, RNA: A publication of the RNA Society.

[9]  Yves Van de Peer,et al.  Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences , 2004, Bioinform..

[10]  Michael Gribskov,et al.  Accurate Classification of RNA Structures Using Topological Fingerprints , 2016, PloS one.

[11]  Minghui Jiang,et al.  uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts , 2008, BMC Bioinformatics.

[12]  Anton J. Enright,et al.  Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests , 2017, Nucleic acids research.

[13]  E. Martínez-Salas,et al.  IRES elements: features of the RNA structure contributing to their activity. , 2002, Biochimie.

[14]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[15]  P. Moore,et al.  Structural motifs in RNA. , 1999, Annual review of biochemistry.

[16]  Calum MacAulay,et al.  Opening the Black Box: the Relationship between Neural Networks and Linear Discriminant Functions , 1997, Analytical cellular pathology : the journal of the European Society for Analytical Cellular Pathology.

[17]  J. Friedman Stochastic gradient boosting , 2002 .

[18]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[19]  Eric Westhof,et al.  Structure of the ribosome-bound cricket paralysis virus IRES RNA , 2006, Nature Structural &Molecular Biology.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[22]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[23]  Wen-Chang Chang,et al.  Nucleolin enhances internal ribosomal entry site (IRES)-mediated translation of Sp1 in tumorigenesis. , 2014, Biochimica et biophysica acta.

[24]  J. Kieft,et al.  A preformed compact ribosome-binding domain in the cricket paralysis-like virus IRES RNAs. , 2005, RNA.

[25]  E. Trotta,et al.  On the Normalization of the Minimum Free Energy of RNAs by Sequence Length , 2014, PloS one.

[26]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[27]  A. Komar,et al.  Internal Ribosome Entry Sites in Cellular mRNAs: Mystery of Their Existence* , 2005, Journal of Biological Chemistry.

[28]  J. Kieft,et al.  The structures of nonprotein‐coding RNAs that drive internal ribosome entry site function , 2012, Wiley interdisciplinary reviews. RNA.

[29]  N. Sonenberg,et al.  Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA , 1988, Nature.

[30]  Tsair-Yuan Chang,et al.  Viral IRES Prediction System - a Web Server for Prediction of the IRES Secondary Structure In Silico , 2013, PloS one.

[31]  Joseph Gera,et al.  Heterogeneous Nuclear Ribonucleoprotein A1 Regulates Cyclin D1 and c-myc Internal Ribosome Entry Site Function through Akt Signaling* , 2008, Journal of Biological Chemistry.

[32]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[33]  E. Martínez-Salas,et al.  Evolutionary conserved motifs constrain the RNA structure organization of picornavirus IRES , 2013, FEBS letters.

[34]  D. Corey,et al.  Intracellular inhibition of hepatitis C virus (HCV) internal ribosomal entry site (IRES)-dependent translation by peptide nucleic acids (PNAs) and locked nucleic acids (LNAs). , 2004, Nucleic acids research.

[35]  Bin Huang,et al.  Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. , 2018, Annals of translational medicine.

[36]  P. Clote,et al.  Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. , 2005, RNA.

[37]  P. Sarnow,et al.  Factorless ribosome assembly on the internal ribosome entry site of cricket paralysis virus. , 2002, Journal of molecular biology.

[38]  Stephan H Bernhart,et al.  RNA structure prediction. , 2011, Methods in molecular biology.

[39]  David H Mathews,et al.  Revolutions in RNA secondary structure prediction. , 2006, Journal of molecular biology.

[40]  Xiaofeng Song,et al.  IRESfinder: Identifying RNA internal ribosome entry site in eukaryotic cell using framed k-mer features. , 2018, Journal of genetics and genomics = Yi chuan xue bao.

[41]  Eran Segal,et al.  Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity , 2017, PLoS Comput. Biol..

[42]  A. Sharathchandra,et al.  IRES mediated translational regulation of p53 isoforms , 2014, Wiley interdisciplinary reviews. RNA.

[43]  B. Lebleu,et al.  Oligonucleotide-based strategies to inhibit human hepatitis C virus. , 2003, Oligonucleotides.

[44]  J. Kieft,et al.  Toward a structural understanding of IRES RNA function. , 2009, Current opinion in structural biology.

[45]  M. Bushell,et al.  Internal ribosome entry segment-mediated translation during apoptosis: the role of IRES-trans-acting factors , 2005, Cell Death and Differentiation.

[46]  Z. Yakhini,et al.  Systematic discovery of cap-independent translation sequences in human and viral genomes , 2016, Science.

[47]  A. Willis,et al.  Cellular internal ribosome entry segments: structures, trans-acting factors and regulation of gene expression , 2004, Oncogene.

[48]  D. Turner,et al.  Improved predictions of secondary structures for RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[49]  A. Komar,et al.  Exploring Internal Ribosome Entry Sites as Therapeutic Targets , 2015, Front. Oncol..

[50]  A. Komar,et al.  A new framework for understanding IRES-mediated translation. , 2012, Gene.

[51]  Ernesto Picardi,et al.  UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2009, Nucleic Acids Res..

[52]  F. Martin,et al.  Viral internal ribosomal entry sites: four classes for one goal , 2018, Wiley interdisciplinary reviews. RNA.