PRESa2i: incremental decision trees for prediction of Adenosine to Inosine RNA editing sites

RNA editing is a very crucial cellular process affecting protein encoding and is sometimes correlated with the cause of fatal diseases, such as cancer. Thus knowledge about RNA editing sites in a RNA sequence is very important. Adenosine to Inosine (A-to-I) is the most common of the RNA editing events. In this paper,we present PRESa2i, a computation prediction tool for identification of A-to-I RNA editing sites in given RNA sequences. PRESa2i uses a simple, yet effective set of sequence based features generated from RNA sequences and a novel feature selection technique. It uses an incremental decision tree algorithm as the classification algorithm. On a standard benchmark dataset and independent set, it achieves 86.48% accuracy and 90.67% sensitivity and significantly outperforms state-of-the-art methods. We have also implemented a web application based on PRESa2i and made it available freely at: http://brl.uiu.ac.bd/presa2i/index.php. The materials for this paper are also available to use from: https://github.com/swakkhar/RNA-Editing/.

[1]  Henning Lenz,et al.  PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation , 2013, Bioinformatics and biology insights.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Yanda Li,et al.  Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information. , 2008, Journal of theoretical biology.

[4]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[5]  Wen-Lian Hsu,et al.  PSLDoc: Protein subcellular localization prediction based on gapped‐dipeptides and probabilistic latent semantic analysis , 2008, Proteins.

[6]  Y. Dong,et al.  Draft genome of the gayal, Bos frontalis , 2017, GigaScience.

[7]  S. Maas,et al.  Gene regulation through RNA editing. , 2010, Discovery medicine.

[8]  Abdollah Dehzangi,et al.  iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting , 2017, Scientific Reports.

[9]  Wei Chen,et al.  PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions , 2016, Scientific Reports.

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  Abdollah Dehzangi,et al.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features , 2017, Scientific Reports.

[12]  Pei Hao,et al.  The Landscape of A-to-I RNA Editome Is Shaped by Both Positive and Purifying Selection , 2016, PLoS genetics.

[13]  T. Mikkelsen,et al.  Altered adenosine-to-inosine RNA editing in human cancer. , 2007, Genome research.

[14]  R. Kierzek,et al.  Regulation of alternative splicing by RNA secondary structure , 2015 .

[15]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[16]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[17]  Peng Wang,et al.  PAI-SAE: Predicting Adenosine To Inosine Editing Sites Based On Hybrid Features By Using Spare Auto-Encoder , 2018, IOP Conference Series: Earth and Environmental Science.

[18]  Abdollah Dehzangi,et al.  EvoStruct-Sub: An accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features. , 2018, Journal of theoretical biology.

[19]  Abdollah Dehzangi,et al.  iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features. , 2017, Journal of theoretical biology.

[20]  Xun Xu,et al.  RED-ML: a novel, effective RNA editing detection method based on machine learning , 2017, GigaScience.

[21]  Michael A. Beer,et al.  Robust $$k$$k-mer frequency estimation using gapped $$k$$k-mers , 2014, Journal of mathematical biology.

[22]  Philipp Kapranov,et al.  Genome-wide analysis of A-to-I RNA editing by single-molecule sequencing in Drosophila , 2013, Nature Structural &Molecular Biology.

[23]  K. Chou,et al.  iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites , 2018, Molecular therapy. Nucleic acids.

[24]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[25]  Swakkhar Shatabda,et al.  iRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components. , 2019, Genomics.

[26]  Md. Rafsan Jani,et al.  iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. , 2019, Genomics.