Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing.

[1]  Henning Lenz,et al.  PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation , 2013, Bioinformatics and biology insights.

[2]  Pavel V. Baranov,et al.  DARNED: a DAtabase of RNa EDiting in humans , 2010, Bioinform..

[3]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[4]  Peter H. Seeburg,et al.  A-to-I RNA Editing: Effects on Proteins Key to Neural Excitability , 2012, Neuron.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Jin Billy Li,et al.  Edinburgh Research Explorer Identifying Rna Editing Sites Using Rna Sequencing Data Alone , 2022 .

[7]  Mattia D'Antonio,et al.  ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments , 2011, Bioinform..

[8]  Joshua J C Rosenthal,et al.  RNA Editing Underlies Temperature Adaptation in K+ Channels from Polar Octopuses , 2012, Science.

[9]  Stephen J. Wright,et al.  Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  T. Honjo,et al.  RNA editing of hepatitis B virus transcripts by activation-induced cytidine deaminase , 2013, Proceedings of the National Academy of Sciences.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[13]  Jin Billy Li,et al.  Accurate identification of human Alu and non-Alu RNA editing sites , 2012, Nature Methods.

[14]  Pavel V. Baranov,et al.  Darned in 2013: inclusion of model organisms and linking with Wikipedia , 2012, Nucleic Acids Res..

[15]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[16]  Mingyao Li,et al.  Widespread RNA and DNA Sequence Differences in the Human Transcriptome , 2011, Science.

[17]  Li Yang,et al.  Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences , 2013, BMC Genomics.

[18]  B. Williams,et al.  RNA editing in the human ENCODE RNA-seq data , 2012, Genome research.

[19]  Ralf Bundschuh,et al.  Computational prediction of RNA editing sites , 2004, Bioinform..

[20]  Eli Eisenberg,et al.  A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes , 2014, Genome research.

[21]  Jin Billy Li,et al.  RADAR: a rigorously annotated database of A-to-I RNA editing , 2013, Nucleic Acids Res..

[22]  E. Levanon,et al.  Identification of RNA editing sites in the SNP database , 2005, Nucleic acids research.

[23]  Jae-Hyung Lee,et al.  Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. , 2012, Genome research.

[24]  Leilei Chen,et al.  Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma , 2013, Nature Medicine.

[25]  Wenwei Zhang,et al.  Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome , 2012, Nature Biotechnology.

[26]  Bernhard Schölkopf,et al.  A tutorial on ν-support vector machines: Research Articles , 2005 .

[27]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[28]  L. Groop,et al.  Global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism , 2014, Proceedings of the National Academy of Sciences.

[29]  Brenda L. Bass,et al.  Predicting sites of ADAR editing in double-stranded RNA , 2011, Nature communications.

[30]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[31]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[32]  Ernesto Picardi,et al.  REDItools: high-throughput RNA editing detection made easy , 2013, Bioinform..

[33]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[34]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[35]  Yi Xing,et al.  Adenosine deamination in human transcripts generates novel microRNA binding sites , 2009, Human molecular genetics.

[36]  Patricia P. Chan,et al.  GtRNAdb: a database of transfer RNA genes detected in genomic sequence , 2008, Nucleic Acids Res..

[37]  Matthew W. Hahn,et al.  Very Few RNA and DNA Sequence Differences in the Human Transcriptome , 2011, PloS one.