PseUI: Pseudouridine sites identification based on RNA sequence information

BackgroundPseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement.ResultsIn this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations.ConclusionIn this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites.

[1]  M. Caizergues-Ferrer,et al.  A small nucleolar RNP protein is required for pseudouridylation of eukaryotic ribosomal RNAs , 1997, The EMBO journal.

[2]  Mathieu Blanchette,et al.  BigDataScript: a scripting language for data pipelines , 2014, Bioinform..

[3]  Yi-Tao Yu,et al.  RNA pseudouridylation: new insights into an old modification. , 2013, Trends in biochemical sciences.

[4]  Yi Xiong,et al.  PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. , 2017, Journal of theoretical biology.

[5]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[6]  W. Gilbert,et al.  Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells , 2014, Nature.

[7]  Kuo-Chen Chou,et al.  pLoc‐mAnimal: predict subcellular localization of animal proteins with both single and multiple sites , 2017, Bioinform..

[8]  Constantine Kotropoulos,et al.  Sequential forward feature selection with low computational cost , 2005, 2005 13th European Signal Processing Conference.

[9]  Jie Wu,et al.  RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data , 2015, Nucleic Acids Res..

[10]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[11]  Mathieu Blanchette,et al.  Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences , 2016, Nucleic acids research.

[12]  Yan-Hui Li,et al.  PPUS: a web server to predict PUS-specific pseudouridine sites , 2015, Bioinform..

[13]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.

[14]  Maurille J. Fournier,et al.  The Pseudouridine Residues of rRNA: Number, Location, Biosynthesis, and Function , 1998 .

[15]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[16]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[17]  P. Brown,et al.  Transcriptome-Wide Mapping of Pseudouridines: Pseudouridine Synthases Modify Specific mRNAs in S. cerevisiae , 2014, PloS one.

[18]  Lei Wang,et al.  On the Optimality of Sequential Forward Feature Selection Using Class Separability Measure , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.

[19]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[20]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[21]  Xinliang Zhao,et al.  Pseudouridylation (Ψ) of U2 snRNA in S.cerevisiae is catalyzed by an RNA‐independent mechanism , 2003, The EMBO journal.

[22]  Daisuke Kihara,et al.  Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0 , 2015, Bioinform..

[23]  Yi Xiong,et al.  Protein-protein interface hot spots prediction based on a hybrid feature selection strategy , 2018, BMC Bioinformatics.

[24]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[25]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[26]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[27]  Trygve Almøy,et al.  Comparing K-mer based methods for improved classification of 16S sequences , 2015, BMC Bioinformatics.

[28]  Julie C. Mitchell,et al.  DBSI: DNA-binding site identifier , 2013, Nucleic acids research.

[29]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[30]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[31]  Kuo-Chen Chou,et al.  pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information , 2018, Bioinform..

[32]  Pengmian Feng,et al.  Prediction of DNase I Hypersensitive Sites by Using Pseudo Nucleotide Compositions , 2014, TheScientificWorldJournal.

[33]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[34]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[35]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[36]  Henri Grosjean,et al.  DNA and RNA Modification Enzymes: Structure, Mechanism, Function and Evolution , 2009 .

[37]  Christiane Branlant,et al.  The Saccharomyces cerevisiae U2 snRNA:pseudouridine-synthase Pus7p is a novel multisite-multisubstrate RNA:Psi-synthase also acting on tRNAs. , 2003, RNA.

[38]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[39]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[40]  Maxwell R. Mumbach,et al.  Transcriptome-wide Mapping Reveals Widespread Dynamic-Regulated Pseudouridylation of ncRNA and mRNA , 2014, Cell.

[41]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[42]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[43]  Shiqing Ma,et al.  Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome. , 2015, Nature chemical biology.

[44]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[45]  Marcin Feder,et al.  MODOMICS: a database of RNA modification pathways , 2005, Nucleic Acids Res..

[46]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[47]  Rosni Abdullah,et al.  Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter. , 2015, Journal of theoretical biology.

[48]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[49]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[50]  Hong-Bin Shen,et al.  TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine , 2016, IEEE Transactions on NanoBioscience.

[51]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[52]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[53]  J. Rodgers,et al.  The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy. , 1999, Multivariate behavioral research.

[54]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[55]  Wei Chen,et al.  Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. , 2016, Molecular bioSystems.

[56]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[57]  N. L. Greenbaum,et al.  A conserved pseudouridine modification in eukaryotic U2 snRNA induces a change in branch-site architecture. , 2001, RNA.

[58]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[59]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[60]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[61]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[62]  Julie C. Mitchell,et al.  DBSI server: DNA binding site identifier , 2016, Bioinform..

[63]  Yan Xu,et al.  Phogly-PseAAC: Prediction of lysine phosphoglycerylation in proteins incorporating with position-specific propensity. , 2015, Journal of theoretical biology.

[64]  Jef Rozenski,et al.  The RNA modification database, RNAMDB: 2011 update , 2010, Nucleic Acids Res..

[65]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[66]  Kuo-Chen Chou,et al.  iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals , 2017, Bioinform..

[67]  Sandeep Kumar,et al.  GAP: towards almost 100 percent prediction for β-strand-mediated aggregating peptides with distinct morphologies , 2014, Bioinform..

[68]  L. Dalgleish,et al.  Discriminant analysis: Statistical inference using the jackknife and bootstrap procedures , 1994 .

[69]  Yong-Zi Chen,et al.  GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. , 2007, Protein engineering, design & selection : PEDS.

[70]  Qing Zhang,et al.  High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles , 2011, Bioinform..

[71]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[72]  Mathieu Blanchette,et al.  CoreTracker: accurate codon reassignment prediction, applied to mitochondrial genomes , 2017, Bioinform..

[73]  Farida Zehraoui,et al.  Towards a piRNA prediction using multiple kernel fusion and support vector machine , 2014, Bioinform..

[74]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[75]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[76]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[77]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.