Cr-Prom: A Convolutional Neural Network-Based Model for the Prediction of Rice Promoters

The promoter is a regulatory region of the DNA typically located upstream of a gene and plays a key role in regulating gene transcription. Accurate prediction of promoters is crucial for the analysis of gene expression patterns and for the development and understanding of genetic regulatory networks. Genomes of several species have been sequenced, and their gene content has been established to a large extent. Some bioinformatics algorithms have been developed for predicting promoters with high universality for all kinds of plants; however, few studies have been conducted to identify promoters in rice, which might affect the practical applications. Here, we present a rice promoter prediction tool, Cr-Prom. This predictor has been established using a series of sequence-based features and datasets extracted from the PlantProm and RAP-DB databases. We applied a convolutional neural network (CNN)-based strategy to construct a predictor with robust classification performance. To demonstrate our dominance, we ran experiments on a benchmark dataset using 5-fold cross-validation and compared our results with existing techniques using four figure of merits. In addition, CR-Prom was analyzed on an independent dataset. Based on the results, Cr-Prom outperformed the existing rice-specific promoter predictors. The Cr-Prom tool can be freely accessed at: http://nsclbio.jbnu.ac.kr/tools/Cr-Prom/

[1]  Martin S. Taylor,et al.  The frequent evolutionary birth and death of functional promoters in mouse and human , 2015, Genome research.

[2]  Yu Li,et al.  Promoter analysis and prediction in the human genome using sequence-based deep learning models , 2019, Bioinform..

[3]  Xiuping Jia,et al.  Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[4]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[5]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[6]  Hilal Tayara,et al.  m6A-NeuralTool: Convolution Neural Tool for RNA N6-Methyladenosine Site Identification in Different Species , 2021, IEEE Access.

[7]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[8]  Syed Muhammad Anwar,et al.  Natural scene statistics model independent no-reference image quality assessment using patch based discrete cosine transform , 2020, Multimedia Tools and Applications.

[9]  Zeeshan Abbas,et al.  Diabetic retinopathy fundus image classification using discrete wavelet transform , 2018, 2018 2nd International Conference on Engineering Innovation (ICEI).

[10]  Hyongsuk Kim,et al.  PMED-Net: Pyramid Based Multi-Scale Encoder-Decoder Network for Medical Image Segmentation , 2021, IEEE Access.

[11]  Hilal Tayara,et al.  4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network , 2021, Genes.

[12]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[13]  Md Nazmul Khan Liton,et al.  iPromoter-BnCNN: a novel branched CNN based predictor for identifying and classifying sigma promoters , 2019, bioRxiv.

[14]  P. Wittkopp,et al.  Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence , 2011, Nature Reviews Genetics.

[15]  Syed Danish Ali,et al.  Identification of Functional piRNAs Using a Convolutional Neural Network , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Wenhu Tang,et al.  Deep Learning for Daily Peak Load Forecasting–A Novel Gated Recurrent Neural Network Combining Dynamic Time Warping , 2019, IEEE Access.

[17]  Mobeen-ur-Rehman,et al.  Classification of Diabetic Retinopathy Images Based on Customised CNN Architecture , 2019, 2019 Amity International Conference on Artificial Intelligence (AICAI).

[18]  Cangzhi Jia,et al.  EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron-ion interaction potential feature selection. , 2017, Molecular bioSystems.

[19]  Hyongsuk Kim,et al.  SEEK: A Framework of Superpixel Learning with CNN Features for Unsupervised Segmentation , 2020 .

[20]  L. de la Rosa,et al.  Molecular bases for drought tolerance in common vetch: designing new molecular breeding tools , 2020, BMC Plant Biology.

[21]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[22]  Cornelia I Bargmann,et al.  Comparing genomic expression patterns across species identifies shared transcriptional profile in aging , 2004, Nature Genetics.

[23]  Yanming Zuo,et al.  ProRice: An Ensemble Learning Approach for Predicting Promoters in Rice , 2020, CSAE.

[24]  Hilal Tayara,et al.  SpineNet-6mA: A Novel Deep Learning Tool for Predicting DNA N6-Methyladenine Sites in Genomes , 2020, IEEE Access.

[25]  Philipp Khaitovich,et al.  Toward a Neutral Evolutionary Model of Gene Expression , 2005, Genetics.

[26]  Kil To Chong,et al.  Branch Point Selection in RNA Splicing Using Deep Learning , 2019, IEEE Access.

[27]  Md. Rafsan Jani,et al.  iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. , 2019, Genomics.

[28]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[29]  Victor V. Solovyev,et al.  TSSPlant: a new tool for prediction of plant Pol II promoters , 2017, Nucleic acids research.

[30]  K. Chong,et al.  BU-Net: Brain Tumor Segmentation Using Modified U-Net Architecture , 2020, Electronics.

[31]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[32]  N. Barkai,et al.  A genetic signature of interspecies variations in gene expression , 2006, Nature Genetics.

[33]  Bing Yang,et al.  Efficient CRISPR/Cas9-Mediated Gene Editing in Arabidopsis thaliana and Inheritance of Modified Genes in the T2 and T3 Generations , 2014, PloS one.

[34]  Abdul Wahab,et al.  pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters , 2020, Genes.

[35]  Scott A. Rifkin,et al.  Genetic Properties Influencing the Evolvability of Gene Expression , 2007, Science.

[36]  A. Paterson,et al.  Epistasis for three grain yield components in rice (Oryza sativa L.). , 1997, Genetics.

[37]  K. Chong,et al.  BrainSeg-Net: Brain Tumor MR Image Segmentation via Enhanced Encoder–Decoder Network , 2021, Diagnostics.

[38]  Kil To Chong,et al.  DeePromoter: Robust Promoter Predictor Using Deep Learning , 2019, Front. Genet..

[39]  Jiangning Song,et al.  MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters , 2019, Bioinform..

[40]  Kil To Chong,et al.  DNA6mA-MINT: DNA-6mA Modification Identification Neural Tool , 2020, Genes.

[41]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[42]  Syed Muhammad Anwar,et al.  No-reference image quality assessment using bag-of-features with feature selection , 2020, Multimedia Tools and Applications.