i6mA-DNCP: Computational Identification of DNA N6-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features

DNA N6-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wide scope, but they are too time-consuming and expensive. Developing computational methods to rapidly identify 6mA sites is needed. In this paper, a new machine learning-based method, i6mA-DNCP, was proposed for identifying 6mA sites in the rice genome. Dinucleotide composition and dinucleotide-based DNA properties were first employed to represent DNA sequences. After a specially designed DNA property selection process, a bagging classifier was used to build the prediction model. The jackknife test on a benchmark dataset demonstrated that i6mA-DNCP could obtain 84.43% sensitivity, 88.86% specificity, 86.65% accuracy, a 0.734 Matthew’s correlation coefficient (MCC), and a 0.926 area under the receiver operating characteristic curve (AUC). Moreover, three independent datasets were established to assess the generalization ability of our method. Extensive experiments validated the effectiveness of i6mA-DNCP.

[1]  Cangzhi Jia,et al.  4mCPred: machine learning methods for DNA N4‐methylcytosine sites prediction , 2018, Bioinform..

[2]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[3]  Kil To Chong,et al.  iDNA6mA (5-step rule): Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule , 2019, Chemometrics and Intelligent Laboratory Systems.

[4]  Q. Cui,et al.  SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features , 2016, Nucleic acids research.

[5]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[6]  Yang Shi,et al.  DNA N6-methyladenine: a new epigenetic mark in eukaryotes? , 2015, Nature Reviews Molecular Cell Biology.

[7]  Liang Kong,et al.  iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components. , 2018, Journal of theoretical biology.

[8]  Peng Jin,et al.  DNA N6-methyladenine is dynamically regulated in the mouse brain following environmental stress , 2017, Nature Communications.

[9]  K. Chou,et al.  iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. , 2018, Genomics.

[10]  L. Aravind,et al.  DNA Methylation on N6-Adenine in C. elegans , 2015, Cell.

[11]  Shanxin Zhang,et al.  Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. , 2018, Analytical biochemistry.

[12]  Wei Chen,et al.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome , 2019, Bioinform..

[13]  J. Cadet,et al.  High-performance liquid chromatography--tandem mass spectrometry measurement of radiation-induced base damage to isolated and cellular DNA. , 2000, Chemical research in toxicology.

[14]  Andrew V. Colasanti,et al.  A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning. , 2007, Journal of molecular biology.

[15]  Chuan-Le Xiao,et al.  MDR: an integrative DNA N6-methyladenine and N4-methylcytosine modification database for Rosaceae , 2019, Horticulture Research.

[16]  Liang Kong,et al.  iRSpot-PDI: Identification of recombination spots by incorporating dinucleotide property diversity information into Chou's pseudo components. , 2019, Genomics.

[17]  James A. Swenberg,et al.  DNA methylation on N6-adenine in mammalian embryonic stem cells , 2016, Nature.

[18]  T. Richmond,et al.  The structure of DNA in the nucleosome core , 2003, Nature.

[19]  Chuan He,et al.  Abundant DNA 6mA methylation during early embryogenesis of zebrafish and pig , 2016, Nature Communications.

[20]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[21]  L. Doré,et al.  N 6-Methyldeoxyadenosine Marks Active Transcription Start Sites in Chlamydomonas , 2015, Cell.

[22]  Yu Zhao,et al.  Identification and analysis of adenine N6-methylation sites in the rice genome , 2018, Nature Plants.

[23]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[24]  Shanxin Zhang,et al.  pDHS-DSET: Prediction of DNase I hypersensitive sites in plant genome using DS evidence theory. , 2019, Analytical biochemistry.

[25]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[26]  Shunmin He,et al.  N6-Methyladenine DNA Modification in Drosophila , 2015, Cell.

[27]  Charles R. Bradshaw,et al.  Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications , 2015, Nature Structural &Molecular Biology.

[28]  Zhen Xu,et al.  pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines , 2018, Molecular Genetics and Genomics.

[29]  François Berger,et al.  N6-methyladenine: the other methylated base of DNA. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[30]  Fan Liang,et al.  DNA N6-adenine methylation in Arabidopsis thaliana , 2017, Mechanisms of Development.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[33]  Hao Lin,et al.  iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice , 2019, Front. Genet..

[34]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[35]  Wei Chen,et al.  Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines , 2017, Scientific Reports.

[36]  Hongkun Zheng,et al.  N6-Methyladenine DNA Methylation in Japonica and Indica Rice Genomes and Its Association with Gene Expression, Plant Development, and Stress Responses. , 2018, Molecular plant.

[37]  B. Liu,et al.  iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance , 2016, Scientific Reports.

[38]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[39]  Yong-qiang Xing,et al.  Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. , 2015, Journal of theoretical biology.

[40]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[41]  P. Modrich,et al.  Extent of equilibrium perturbation of the DNA helix upon enzymatic methylation of adenine residues. , 1985, The Journal of biological chemistry.