Development of a new oligonucleotide block location-based feature extraction (BLBFE) method for the classification of riboswitches

As knowledge of genetics and genome elements increases, the demand for the development of bioinformatics tools for analyzing these data is raised. Riboswitches are genetic components, usually located in the untranslated regions of mRNAs, that regulate gene expression. Additionally, their interaction with antibiotics has been recently suggested, implying a role in antibiotic effects and resistance. Following a previously published sequential block finding algorithm, herein, we report the development of a new block location-based feature extraction strategy (BLBFE). This procedure utilizes the locations of family-specific sequential blocks on riboswitch sequences as features. Furthermore, the performance of other feature extraction strategies, including mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods, was investigated. KNN, LDA, naïve Bayes, PNN and decision tree classifiers accompanied by V -fold cross-validation were applied for all methods of feature extraction, and their performances based on the defined feature extraction strategies were compared. Performance measures of accuracy, sensitivity, specificity and F -score for each method of feature extraction were studied. The proposed feature extraction strategy resulted in classification of riboswitches with an average correct classification rate (CCR) of 90.8%. Furthermore, the obtained data confirmed the performance of the developed feature extraction method with an average accuracy of 96.1%, an average sensitivity of 90.8%, an average specificity of 97.52% and an average F -score of 90.69%. Our results implied that the proposed feature extraction (BLBFE) method can classify and discriminate riboswitch families with high CCR, accuracy, sensitivity, specificity and F -score values.

[1]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[2]  T. Cech,et al.  A model for the RNA-catalyzed replication of RNA. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  R. Breaker,et al.  The structural and functional diversity of metabolite-binding riboswitches. , 2009, Annual review of biochemistry.

[5]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[6]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[7]  R. Breaker,et al.  Gene regulation by riboswitches , 2004, Nature Reviews Molecular Cell Biology.

[8]  A. Serganov,et al.  A Decade of Riboswitches , 2013, Cell.

[9]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[10]  A. Serganov,et al.  Themes and variations in riboswitch structure and function. , 2014, Biochimica et biophysica acta.

[11]  A. Barzegar,et al.  Riboswitches as Potential Targets for Aminoglycosides Compared with rRNA Molecules: In Silico Study , 2015 .

[12]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[13]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[14]  Jeffrey S. Thompson,et al.  A new approach for detecting riboswitches in DNA sequences , 2014, Bioinform..

[15]  Mark S Dunstan,et al.  Modular riboswitch toolsets for synthetic genetic control in diverse bacterial species. , 2014, Journal of the American Chemical Society.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Swadha Singh,et al.  Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches , 2016, Briefings in functional genomics.

[18]  Ali Nahvi,et al.  Genetic control by a metabolite binding mRNA. , 2002, Chemistry & biology.

[19]  Farren J. Isaacs,et al.  Engineered riboregulators enable post-transcriptional control of gene expression , 2004, Nature Biotechnology.

[20]  R. Breaker,et al.  Antibacterial lysine analogs that target lysine riboswitches. , 2007, Nature chemical biology.

[21]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[22]  R. Breaker,et al.  Thiamine pyrophosphate riboswitches are targets for the antimicrobial compound pyrithiamine. , 2005, Chemistry & biology.

[23]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[24]  A. Ferré-D’Amaré,et al.  Rapid RNA–ligand interaction analysis through high-information content conformational and stability landscapes , 2015, Nature Communications.

[25]  M. Sedaaghi,et al.  Development of a new sequential block finding strategy for detection of conserved sequences in riboswitches , 2017, BioImpacts : BI.

[26]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[27]  A. Barzegar,et al.  Evolutionary Origin and Conserved Structural Building Blocks of Riboswitches and Ribosomal RNAs: Riboswitches as Probable Target Sites for Aminoglycosides Interaction. , 2014, Advanced pharmaceutical bulletin.

[28]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[29]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[30]  R. Breaker,et al.  Control of gene expression by a natural metabolite-responsive ribozyme , 2004, Nature.

[31]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[32]  Ronald R. Breaker,et al.  Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression , 2002, Nature.

[33]  Mijeong Kang,et al.  Structural Insights into riboswitch control of the biosynthesis of queuosine, a modified nucleotide found in the anticodon of tRNA. , 2009, Molecular cell.

[34]  Ali Nahvi,et al.  An mRNA structure that controls gene expression by binding S-adenosylmethionine , 2003, Nature Structural Biology.

[35]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[36]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[37]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[38]  Pradipta Bandyopadhyay,et al.  Riboswitch Detection Using Profile Hidden Markov Models , 2009, BMC Bioinformatics.

[39]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[41]  Bin Liu,et al.  Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences , 2017 .

[42]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[43]  David G. Stork,et al.  Pattern Classification , 1973 .

[44]  Jeffrey E. Barrick,et al.  The distributions, mechanisms, and structures of metabolite-binding riboswitches , 2007, Genome Biology.

[45]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[46]  Byung-Jun Yoon,et al.  HMM with auxiliary memory: a new tool for modeling RNA structures , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[47]  Bangjun Lei,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB, 2nd Edition , 2017 .

[48]  A. Serganov,et al.  Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch , 2009, Nature.

[49]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[50]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[51]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[52]  Ronald R. Breaker,et al.  Roseoflavin is a natural antibacterial compound that binds to FMN riboswitches and regulates gene expression , 2009, RNA biology.

[53]  Yue Gao,et al.  Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set , 2014, TCBB.

[54]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[55]  P. P. Vaidyanathan,et al.  Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results , 2008, IEEE Transactions on Automatic Control.

[56]  Adam Roth,et al.  A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain , 2007, Nature Structural &Molecular Biology.

[57]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[58]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[59]  Swetlana Nikolajewa,et al.  DiProDB: a database for dinucleotide properties , 2008, Nucleic Acids Res..