Supervised Learning Classification Models for Prediction of Plant Virus Encoded RNA Silencing Suppressors

Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a freely accessible web server pVsupPred at http://bioinfo.icgeb.res.in/pvsup/.

[1]  Qingfa Wu,et al.  Viral suppressors of RNA-based viral immunity: host targets. , 2010, Cell host & microbe.

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  D. Nettelbeck,et al.  RNAi suppressor P19 can be broadly exploited for enhanced adenovirus replication and microRNA knockdown experiments , 2013, Scientific Reports.

[7]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[8]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[9]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[10]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[11]  C. Hornyik,et al.  A viral protein suppresses RNA silencing and binds silencing‐generated, 21‐ to 25‐nucleotide double‐stranded RNAs , 2002, The EMBO journal.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[14]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  June Hyung Lee,et al.  Dual modes of RNA-silencing suppression by Flock House virus protein B2 , 2005, Nature Structural &Molecular Biology.

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[19]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[20]  Ling Zhou,et al.  Silencing suppressors: viral weapons for countering host cell defenses , 2011, Protein & Cell.

[21]  B. Berkhout,et al.  RNAi suppressors encoded by pathogenic human viruses. , 2008, The international journal of biochemistry & cell biology.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  S. Ding,et al.  Viral suppressors of RNA silencing. , 2001, Current opinion in biotechnology.

[24]  A. Mallory,et al.  A viral suppressor of gene silencing in plants. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  D. Silhavy,et al.  Double-Stranded RNA Binding May Be a General Plant RNA Viral Strategy To Suppress RNA Silencing , 2006, Journal of Virology.

[27]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[28]  O. Voinnet,et al.  Nuclear import of CaMV P6 is required for infection and suppression of the RNA silencing factor DRB4 , 2008, The EMBO journal.

[29]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[30]  Chengqi Lin,et al.  Structural basis for RNA‐silencing suppression by Tomato aspermy virus protein 2b , 2008, EMBO reports.

[31]  M. Pazhouhandeh,et al.  Viral suppression of RNA silencing by destabilization of ARGONAUTE 1 , 2008, Plant signaling & behavior.

[32]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[33]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[34]  Yi Li,et al.  Viral suppression of RNA silencing , 2012, Science China Life Sciences.

[35]  Measho H. Abreha,et al.  Viral RNA silencing suppressors (RSS): novel strategy of viruses to ablate the host RNA interference (RNAi) defense system. , 2011, Virus research.

[36]  S. Mukherjee,et al.  MYMIV-AC2, a Geminiviral RNAi Suppressor Protein, Has Potential to Increase the Transgene Expression , 2012, Applied Biochemistry and Biotechnology.

[37]  M. Sioud RNA interference and innate immunity. , 2007, Advanced drug delivery reviews.

[38]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[39]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[40]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[41]  Olivier Voinnet,et al.  Induction and suppression of RNA silencing: insights from viral infections , 2005, Nature Reviews Genetics.

[42]  E. Barta,et al.  Aureusvirus P14 Is an Efficient RNA Silencing Suppressor That Binds Double-Stranded RNAs without Size Specificity , 2005, Journal of Virology.

[43]  S. Mukherjee,et al.  Screening and Identification of Virus-Encoded RNA Silencing Suppressors , 2008, Methods in molecular biology.

[44]  A. Mulchandani,et al.  Electronic detection of microRNA at attomolar level with high specificity. , 2013, Analytical chemistry.

[45]  T. Csorba,et al.  RNA silencing: an antiviral mechanism. , 2009, Advances in virus research.