i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome

Motivation DNA N6-methyladenine (6mA) is associated with a wide range of biological processes. Since the distribution of 6mA site in the genome is non-random, accurate identification of 6mA sites is crucial for understanding its biological functions. Although experimental methods have been proposed for this regard, they are still cost-ineffective for detecting 6mA site in genome-wide scope. Therefore, it is desirable to develop computational methods to facilitate the identification of 6mA site. Results In this study, a computational method called i6mA-Pred was developed to identify 6mA sites in the rice genome, in which the optimal nucleotide chemical properties obtained by the using feature selection technique were used to encode the DNA sequences. It was observed that the i6mA-Pred yielded an accuracy of 83.13% in the jackknife test. Meanwhile, the performance of i6mA-Pred was also superior to other methods. Availability A user-friendly web-server, i6mA-Pred is freely accessible at http://lin-group.cn/server/i6mA-Pred.

[1]  Pritish Kumar Varadwaj,et al.  DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool , 2017 .

[2]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[3]  M. Marinus,et al.  Analysis of Global Gene Expression and Double-Strand-Break Formation in DNA Adenine Methyltransferase- and Mismatch Repair-Deficient Escherichia coli , 2005, Journal of bacteriology.

[4]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[5]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[6]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[7]  Wei Chen,et al.  Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. , 2016, Molecular bioSystems.

[8]  M. Meselson,et al.  Effects of high levels of DNA adenine methylation on methyl-directed mismatch repair in Escherichia coli. , 1983, Genetics.

[9]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[10]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[12]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[13]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[14]  Yan Lin,et al.  iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators , 2018, Bioinform..

[15]  A. Krais,et al.  Genomic N6‐methyladenine determination by MEKC with LIF , 2010, Electrophoresis.

[16]  Wei Chen,et al.  Classifying Included and Excluded Exons in Exon Skipping Event Using Histone Modifications , 2018, Front. Genet..

[17]  Yu Zhao,et al.  Identification and analysis of adenine N6-methylation sites in the rice genome , 2018, Nature Plants.

[18]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[19]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[20]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[21]  Renzhi Cao,et al.  Survey of Machine Learning Techniques in Drug Discovery. , 2019, Current drug metabolism.

[22]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.

[23]  N. Kleckner,et al.  E. coli oriC and the dnaA gene promoter are sequestered from dam methyltransferase following the passage of the chromosomal replication fork , 1990, Cell.

[24]  Yucong Duan,et al.  70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features , 2018, BMC Syst. Biol..

[25]  Hao Lv,et al.  Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique , 2018, Bioinform..

[26]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[27]  Wei Chen,et al.  Identifying RNA N6-Methyladenosine Sites in Escherichia coli Genome , 2018, Front. Microbiol..

[28]  E. Greer,et al.  N6-Methyladenine: A Conserved and Dynamic DNA Mark. , 2016, Advances in experimental medicine and biology.

[29]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[30]  HaiXia Long,et al.  Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins , 2017 .

[31]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  O. Elemento,et al.  Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons , 2012, Cell.

[33]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[34]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[35]  Guohua Huang,et al.  The Advances and Challenges of Deep Learning Application in Biological Big Data Processing , 2017, Current Bioinformatics.

[36]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[37]  Kristina M Smith,et al.  Genome-wide high throughput analysis of DNA methylation in eukaryotes. , 2009, Methods.

[38]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[39]  Dong Wang,et al.  iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC , 2018, Bioinform..