Different protein-protein interface patterns predicted by different machine learning methods

Different types of protein-protein interactions make different protein-protein interface patterns. Different machine learning methods are suitable to deal with different types of data. Then, is it the same situation that different interface patterns are preferred for prediction by different machine learning methods? Here, four different machine learning methods were employed to predict protein-protein interface residue pairs on different interface patterns. The performances of the methods for different types of proteins are different, which suggest that different machine learning methods tend to predict different protein-protein interface patterns. We made use of ANOVA and variable selection to prove our result. Our proposed methods taking advantages of different single methods also got a good prediction result compared to single methods. In addition to the prediction of protein-protein interactions, this idea can be extended to other research areas such as protein structure prediction and design.

[1]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[2]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[5]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[6]  O. Keskin,et al.  Predicting Protein-Protein Interactions from the Molecular to the Proteome Level. , 2016, Chemical reviews.

[7]  Kaustubh D. Dhole,et al.  Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. , 2014, Journal of theoretical biology.

[8]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[9]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[10]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[11]  Pascal Braun,et al.  History of protein–protein interactions: From egg‐white to complex networks , 2012, Proteomics.

[12]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[14]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[15]  T. Hastie,et al.  Learning Interactions via Hierarchical Group-Lasso Regularization , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[16]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[19]  Chun-Xia Zhang,et al.  A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling , 2014 .

[20]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[21]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[22]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[23]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[24]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[25]  A. Ben-Hur,et al.  PAIRpred: Partner‐specific prediction of interacting residues from sequence and structure , 2014, Proteins.

[26]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[27]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[28]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[29]  D. Koshland The Key–Lock Theory and the Induced Fit Theory , 1995 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[32]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[33]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .