Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA

BackgroundRNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified.ResultsData for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations.ConclusionsSimple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.

[1]  Stephen E. Fienberg,et al.  A Statistical Model , 1990 .

[2]  M W Gray,et al.  RNA editing in plant organelles: a fertile field. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[4]  R. Bock RNA editing in plant mitochondria and chloroplasts. , 2001 .

[5]  M A Williams,et al.  RNA editing site recognition in higher plant mitochondria. , 1999, The Journal of heredity.

[6]  M. W. Gray,et al.  Diversity and Evolution of Mitochondrial RNA Editing Systems , 2003, IUBMB life.

[7]  A Araya,et al.  RNA editing in wheat mitochondria proceeds by a deamination mechanism , 1995, FEBS letters.

[8]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  V. K. Rajasekhar,et al.  RNA Editing in Plant Mitochondria: [alpha]-Phosphate Is Retained during C-to-U Conversion in mRNAs. , 1993, The Plant cell.

[11]  Maureen R. Hanson,et al.  Cross-Competition in Transgenic Chloroplasts Expressing Single Editing Sites Reveals Shared cis Elements , 2002, Molecular and Cellular Biology.

[12]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[13]  L. Breiman Random Forests--random Features , 1999 .

[14]  A. Brennicke,et al.  RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[16]  J. Farré,et al.  cis Recognition Elements in Plant Mitochondrion RNA Editing , 2001, Molecular and Cellular Biology.

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[18]  Michael W. Gray,et al.  RNA editing in plant mitochondria , 1989, Nature.

[19]  Wei Yu,et al.  Evidence for a Site-specific Cytidine Deamination Reaction Involved in C to U RNA Editing of Plant Mitochondria (*) , 1995, The Journal of Biological Chemistry.

[20]  A. Brennicke,et al.  RNA editing in bryophytes and a molecular phylogeny of land plants. , 1996, The EMBO journal.

[21]  W. Yu,et al.  RNA editing in higher plant mitochondria: analysis of biochemistry and specificity. , 1995, Biochimie.

[22]  L. Bonen,et al.  RNA editing status of nad7 intron domains in wheat mitochondria. , 1997, Nucleic acids research.

[23]  José M. Gualberto,et al.  RNA editing in wheat mitochondria results in the conservation of protein sequences , 1989, Nature.

[24]  A. Brennicke,et al.  Evidence for RNA editing in mitochondria of all major groups of land plants except the Bryophyta. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[26]  T Tsudzuki,et al.  Creation of a novel protein-coding region at the RNA level in black pine chloroplasts: the pattern of RNA editing in the gymnosperm chloroplast is different from that in angiosperms. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[27]  H. Handa,et al.  The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. , 2003, Nucleic acids research.

[28]  H. Kössel,et al.  Occurrence of plastid RNA editing in all major lineages of land plants. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[29]  M. Segal,et al.  Relating Amino Acid Sequence to Phenotype: Analysis of Peptide‐Binding Data , 2000, Biometrics.

[30]  Y. Notsu,et al.  The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants , 2002, Molecular Genetics and Genomics.

[31]  S. Litvak,et al.  RNA editing in wheat mitochondria. , 1995, Biochimie.

[32]  M. Hanson,et al.  A guide to RNA editing. , 1997, RNA.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  M. W. Gray,et al.  RNA editing in plant mitochondria and chloroplasts , 1993, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.