Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach.

Because of the importance of proteins in inducing allergenic reactions, the ability of predicting their potential allergenicity has become an important issue. Bioinformatics presents valuable tools for analyzing allergens and these complementary approaches can help traditional techniques to study allergens. This work proposes a computational method for predicting the allergenic proteins. The prediction was performed using pseudo-amino acid composition (PseAAC) and Support Vector Machines (SVMs). The predictor efficiency was evaluated by fivefold cross-validation. The overall prediction accuracies and Matthew's correlation coefficient (MCC) obtained by this method were 91.19% and 0.82, respectively. Furthermore, the minimum Redundancy and Maximum Relevance (mRMR) feature selection method was utilized for measuring the effect and power of each feature. Interestingly, in our study all six characters (hydrophobicity, hydrophilicity, side chain mass, pK1, pK2 and pI) are present among the 10 higher ranked features obtained from the mRMR feature selection method.