Support Vector Machine Classifier for Accurate Identification of piRNA

Piwi-interacting RNA (piRNA) is a newly identified class of small non-coding RNAs. It can combine with PIWI proteins to regulate the transcriptional gene silencing process, heterochromatin modifications, and to maintain germline and stem cell function in animals. To better understand the function of piRNA, it is imperative to improve the accuracy of identifying piRNAs. In this study, the sequence information included the single nucleotide composition, and 16 dinucleotides compositions, six physicochemical properties in RNA, the position specificities of nucleotides both in N-terminal and C-terminal, and the proportions of the similar peptide sequence of both N-terminal and C-terminal in positive and negative samples, which were used to construct the feature vector. Then, the F-Score was applied to choose an optimal single type of features. By combining these selected features, we achieved the best results on the jackknife and the 5-fold cross-validation running 10 times based on the support vector machine algorithm. Moreover, we further evaluated the stability and robustness of our new method.

[1]  Q. Zou,et al.  Protein Folds Prediction with Hierarchical Structured SVM , 2016 .

[2]  Haifan Lin,et al.  A novel class of small RNAs in mouse spermatogenic cells. , 2006, Genes & development.

[3]  Zhiping Weng,et al.  The piRNA targeting rules and the resistance to piRNA silencing in endogenous genes , 2018, Science.

[4]  S. Khan,et al.  Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. , 2017, Journal of theoretical biology.

[5]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.

[6]  Gang Tian,et al.  Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features , 2016, PloS one.

[7]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[8]  Quan Zou,et al.  O‐GlcNAcPRED‐II: an integrated classification algorithm for identifying O‐GlcNAcylation sites based on fuzzy undersampling and a K‐means PCA oversampling technique , 2018, Bioinform..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Feng Liu,et al.  A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs , 2016, BMC Bioinformatics.

[11]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[12]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[13]  M. Bakhtiarizadeh,et al.  OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition. , 2017, Journal of theoretical biology.

[14]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[15]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[16]  Hui Xiao,et al.  NONCODE v3.0: integrative annotation of long noncoding RNAs , 2011, Nucleic Acids Res..

[17]  Hao Chen,et al.  pirScan: a webserver to predict piRNA targeting sites and to avoid transgene silencing in C. elegans , 2018, Nucleic Acids Res..

[18]  Yu Liu,et al.  Bioinformatics: The Impact of Accurate Quantification on Proteomic and Genetic Analysis and Research , 2014 .

[19]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[20]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[21]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[22]  Haifan Lin,et al.  MIWI associates with translational machinery and PIWI-interacting RNAs (piRNAs) in regulating spermatogenesis , 2006, Proceedings of the National Academy of Sciences.

[23]  C. Sander,et al.  A novel class of small RNAs bind to MILI protein in mouse testes , 2006, Nature.

[24]  Li Liu,et al.  piRBase: a web resource assisting piRNA functional study , 2014, Database J. Biol. Databases Curation.

[25]  Stephen A. Billings,et al.  A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking , 2017, Pattern Recognit..

[26]  Molly Hammell,et al.  piRNA-directed cleavage of meiotic transcripts regulates spermatogenesis , 2015, Genes & development.

[27]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[28]  Anthony J. Kusalik,et al.  Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights , 2013, Bioinform..

[29]  Wei Gu,et al.  RNA-MethylPred: A high-accuracy predictor to identify N6-methyladenosine in RNA. , 2016, Analytical biochemistry.

[30]  Jiaxiang Du,et al.  Identification and verification of potential piRNAs from domesticated yak testis , 2017, Reproduction.

[31]  Taiowa A. Montgomery,et al.  piRNA Rules of Engagement. , 2018, Developmental cell.

[32]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[33]  Yi Zhang,et al.  A k-mer scheme to predict piRNAs and characterize locust piRNAs , 2011, Bioinform..

[34]  Fei Li,et al.  Prediction of piRNAs using transposon interaction and a support vector machine , 2014, BMC Bioinformatics.