An accurate classification of native and non-native protein-protein interactions using supervised and semi-supervised learning approaches

The progress in experimental and computational structural biology has led to a rapid growth of experimentally resolved structures and computational models of proteinprotein interactions. However, distinguishing between the physiological and non-physiological interactions remains a challenging problem. In this work, two related problems of interface classification have been addressed. The first problem is concerned with classification of the physiological and crystal-packing interactions. The second problem deals with the classification of the physiological interactions, or their accurate models, and decoys obtained from the inaccurate docking models. We have defined a universal set of interface features and employed supervised and semi-supervised learning approaches to accurately classify the interactions in both problems. Furthermore, we formulated the second problem as a semi-supervised learning problem and employed a transductive SVM to improve the accuracy of classification. Finally, we showed that using the scoring functions from the obtained classifiers, one can improve the accuracy of the docking methods.

[1]  Sergey Lyskov,et al.  The RosettaDock server for local protein–protein docking , 2008, Nucleic Acids Res..

[2]  Olivier Chapelle,et al.  A taxonomy of semi-supervised learning algorithms , 2005 .

[3]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[4]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[5]  Ursula Pieper,et al.  Protein complex compositions predicted by structural similarity , 2006, Nucleic acids research.

[6]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[7]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[8]  Ruth Nussinov,et al.  PatchDock and SymmDock: servers for rigid and symmetric docking , 2005, Nucleic Acids Res..

[9]  Jérôme Azé,et al.  A new protein-protein docking scoring function based on interface residue properties , 2007, Bioinform..

[10]  H. Berman The Protein Data Bank: a historical perspective. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[11]  Dietmar Schomburg,et al.  Efficient comprehensive scoring of docked protein complexes using probabilistic support vector machines , 2007, Proteins.

[12]  Julie Bernauer,et al.  DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological protein-protein interactions , 2008, Bioinform..

[13]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[14]  Sandor Vajda,et al.  CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.

[15]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[16]  David Baker,et al.  Rosetta in CAPRI rounds 13–19 , 2010, Proteins.

[17]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[18]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[19]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[20]  Fred P. Davis,et al.  PIBASE: a comprehensive database of structurally defined protein interfaces , 2005, Bioinform..

[21]  J M Thornton,et al.  Conservation helps to identify biologically relevant crystal contacts. , 2001, Journal of molecular biology.

[22]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[24]  Shoshana J Wodak From the Mediterranean coast to the shores of Lake Ontario: CAPRI's premiere on the American continent , 2007, Proteins.

[25]  Francis Rodier,et al.  Protein–protein interaction at crystal contacts , 1995, Proteins.

[26]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..