Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors

Graphical abstract

[1]  Ying Bi,et al.  Bisulfite-free and base-resolution analysis of 5-methylcytidine and 5-hydroxymethylcytidine in RNA with peroxotungstate , 2019, Chemical communications.

[2]  Prabhakar Tiwari,et al.  Amplification of GC-rich genes by following a combination strategy of primer design, enhancers and modified PCR cycle conditions. , 2007, Molecular and cellular probes.

[3]  J. Hiriart-Urruty,et al.  Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .

[4]  Nak-Kyeong Kim,et al.  Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites , 2008, BMC Bioinformatics.

[5]  Dongwon Lee,et al.  LS-GKM: a new gkm-SVM for large-scale datasets , 2016, Bioinform..

[6]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[7]  Yujia Song,et al.  Transcriptome-Wide Annotation of m5C RNA Modifications Using Machine Learning , 2018, Front. Plant Sci..

[8]  Shaowu Zhang,et al.  lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning. , 2015, Molecular bioSystems.

[9]  Michaela Frye,et al.  RNA modifications regulating cell fate in cancer , 2019, Nature Cell Biology.

[10]  Daniel L Baker,et al.  RNA-guided RNA modification: functional organization of the archaeal H/ACA RNP. , 2005, Genes & development.

[11]  Swakkhar Shatabda,et al.  iPro70-FMWin: identifying Sigma70 promoters using multiple windowing and minimal features , 2018, Molecular Genetics and Genomics.

[12]  Antonino Fiannaca,et al.  A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network , 2015, Artif. Intell. Medicine.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Abdollah Dehzangi,et al.  PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences , 2019, Bioinform..

[15]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[16]  Jos Boekhorst,et al.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? , 2012, Briefings Bioinform..

[17]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[18]  WALDO E. COHN,et al.  Nucleoside-5′-Phosphates from Ribonucleic Acid , 1951, Nature.

[19]  G. Pfeifer,et al.  Tet-Mediated Formation of 5-Hydroxymethylcytosine in RNA , 2014, Journal of the American Chemical Society.

[20]  I. Rácz,et al.  Effect of light on the nucleotide composition of rRNA of wheat seedlings , 2004, Planta.

[21]  Cuong Nguyen,et al.  Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic , 2013 .

[22]  Y. Ju,et al.  A Linear Regression Predictor for Identifying N6-Methyladenosine Sites Using Frequent Gapped K-mer Pattern , 2019, Molecular therapy. Nucleic acids.

[23]  Avanti Shrikumar,et al.  GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs , 2019, Bioinform..

[24]  Janusz M. Bujnicki,et al.  MODOMICS: a database of RNA modification pathways. 2017 update , 2017, Nucleic Acids Res..

[25]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[27]  Jung-Hoon Yoon,et al.  Genetic Control of Replication through N1-methyladenine in Human Cells* , 2015, The Journal of Biological Chemistry.

[28]  Wei Chen,et al.  iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning , 2020, Frontiers in Bioengineering and Biotechnology.

[29]  Rainer Goebel,et al.  Fast Gaussian Naïve Bayes for searchlight classification analysis , 2017, NeuroImage.

[30]  Chuan He,et al.  Where, When, and How: Context-Dependent Functions of RNA Methylation Writers, Readers, and Erasers. , 2019, Molecular cell.

[31]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[32]  Tao Pan,et al.  Dynamic RNA Modifications in Gene Expression Regulation , 2017, Cell.

[33]  Sajid Ahmed,et al.  Hybrid Methods for Class Imbalance Learning Employing Bagging with Sampling Techniques , 2017, 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS).

[34]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[35]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Sajid Ahmed,et al.  LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification , 2017, ArXiv.

[38]  Fei Wang,et al.  Transcriptome-wide distribution and function of RNA hydroxymethylcytosine , 2016, Science.

[39]  Jizhen Li,et al.  5-hydroxymethylcytosine is detected in RNA from mouse brain tissues , 2016, Brain Research.

[40]  John S. Mattick,et al.  The RNA modification landscape in human disease , 2017, RNA.

[41]  M. Sohel Rahman,et al.  CRISPRpred: a flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 systems , 2017 .

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Fei Li,et al.  MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model , 2019, Bioinform..

[44]  Weiwei Li,et al.  Distribution of 5-Hydroxymethylcytosine in Different Human Tissues , 2011, Journal of nucleic acids.

[45]  Chuan He,et al.  Chemical Modifications in the Life of an mRNA Transcript. , 2018, Annual review of genetics.

[46]  J. Lobry,et al.  Relationships Between Genomic G+C Content, RNA Secondary Structures, and Optimal Growth Temperature in Prokaryotes , 1997, Journal of Molecular Evolution.

[47]  Md. Khaledur Rahman,et al.  CRISPRpred: A flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 systems , 2017, PloS one.

[48]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[49]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[50]  Hao Lin,et al.  XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites , 2019, Molecular Genetics and Genomics.

[51]  Wanqing Zhao,et al.  PACES: prediction of N4-acetylcytidine (ac4C) modification sites in mRNA , 2019, Scientific Reports.

[52]  Hongwei Wu,et al.  PCA-based linear combinations of oligonucleotide frequencies for metagenomic DNA fragment binning , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[53]  Hans-Peter Klenk,et al.  Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. , 2014, International journal of systematic and evolutionary microbiology.

[54]  Rui Sun,et al.  RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition , 2019, Molecular therapy. Nucleic acids.

[55]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[56]  Hao Lv,et al.  iRNA-m7G: Identifying N7-methylguanosine Sites by Fusing Multiple Features , 2019, Molecular therapy. Nucleic acids.

[57]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[58]  E. Urso,et al.  Circulating Cell-Free DNA: A Promising Marker of Pathologic Tumor Response in Rectal Cancer Patients Receiving Preoperative Chemoradiotherapy , 2011, Annals of Surgical Oncology.

[59]  Robertas Damasevicius,et al.  Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.

[60]  J. Kowalak,et al.  The role of posttranscriptional modification in stabilization of transfer RNA from hyperthermophiles. , 1994, Biochemistry.

[61]  Yi Zhang,et al.  A k-mer scheme to predict piRNAs and characterize locust piRNAs , 2011, Bioinform..

[62]  Yi Xiong,et al.  PseUI: Pseudouridine sites identification based on RNA sequence information , 2018, BMC Bioinformatics.

[63]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[64]  Gary H. McClelland,et al.  Logistic Regression : Dependent Categorical Variables , 2017 .

[65]  Chuan He,et al.  The emerging biology of RNA post-transcriptional modifications , 2017, RNA biology.

[66]  Abdollah Dehzangi,et al.  CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction. , 2019, Journal of theoretical biology.

[67]  Browne,et al.  Cross-Validation Methods. , 2000, Journal of mathematical psychology.

[68]  Kaoru Inoue,et al.  Functional classification of long non-coding RNAs by kmer content , 2018, Nature Genetics.

[69]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.