Identification of Defensins Employing Recurrence Quantification Analysis and Random Forest Classifiers

Defensins represent a class of antimicrobial peptides synthesized in the body acting against various microbes. In this paper we study defensins using a non-linear signal analysis method Recurrence Quantication Analysis (RQA). We used the descriptors calculated employing RQA for the classification of defensins with Random Forest Classifier.The RQA descriptors were able to capture patterns peculiar to defensins leading to an accuracy rate of 78.12% using 10-fold cross validation.

[1]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[2]  C L Webber,et al.  Dynamical assessment of physiological systems and states using recurrence plot strategies. , 1994, Journal of applied physiology.

[3]  A Giuliani,et al.  Recurrence quantification analysis in structure-function relationships of proteins: an overview of a general methodology applied to the case of TEM-1 beta-lactamase. , 1998, Protein engineering.

[4]  R Benigni,et al.  Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins. , 2000, Biophysical journal.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Tomas Ganz,et al.  Defensins: antimicrobial peptides of vertebrates. , 2004, Comptes rendus biologies.

[7]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[8]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[9]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[10]  V. K. Jayaraman,et al.  Using Recurrence Quantification Analysis Descriptors for Protein Sequence Classification with Support Vector Machines , 2007, Journal of biomolecular structure & dynamics.

[11]  Jonathan D. Hirst,et al.  Prediction of glycosylation sites using random forests , 2008, BMC Bioinformatics.

[12]  Abhijit J. Kulkarni,et al.  Nonlinear signal analysis to understand the dynamics of the protein sequences , 2008 .