An ensemble of K-local hyperplanes for predicting protein-protein interactions

Prediction of protein-protein interaction is a difficult and important problem in biology. In this paper, we propose a new method based on an ensemble of K-local hyperplane distance nearest neighbor (HKNN) classifiers, where each HKNN is trained using a different physicochemical property of the amino acids. Moreover, we propose a new encoding technique that combines the amino acid indices together with the 2-Grams amino acid composition. A fusion of HKNN classifiers combined with the 'Sum rule' enables us to obtain an improvement over other state-of-the-art methods. The approach is demonstrated by building a learning system based on experimentally validated protein-protein interactions in human gastric bacterium Helicobacter pylori and in Human dataset.

[1]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Yonghong Peng,et al.  A novel ensemble machine learning for robust microarray data classification , 2006, Comput. Biol. Medicine.

[5]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[6]  Loris Nanni,et al.  Hyperplanes for predicting protein-protein interactions , 2005, Neurocomputing.

[7]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[8]  Ludmila I. Kuncheva Diversity in multiple classifier systems , 2005, Inf. Fusion.

[9]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[10]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[11]  Alfonso Valencia,et al.  Computational methods for the prediction of protein interaction partners , 2004 .

[12]  Giorgio Valentini,et al.  An experimental bias-variance analysis of SVM ensembles based on resampling techniques , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[14]  Pascal Vincent,et al.  K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms , 2001, NIPS.

[15]  Loris Nanni,et al.  Fusion of classifiers for predicting protein-protein interactions , 2005, Neurocomputing.